AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools

Yao, Peiran, Kosmajac, Matej, Waheed, Abeer, Guzhva, Kostyantyn, Hervieux, Natalie, Barbosa, Denilson

arXiv.org Artificial IntelligenceMar-2-2023

NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsing, and relation extraction. Its extensible design enables researchers and developers to smoothly replace an existing model or integrate a new one. To improve efficiency, we employ a microservice architecture that facilitates allocation of acceleration hardware and parallelization of computation. This paper presents the architecture of NLP Workbench and discusses the challenges we faced in designing it. We also discuss diverse use cases of NLP Workbench and the benefits of using it over other approaches. The platform is under active development, with its source code released under the MIT license. A website and a short video demonstrating our platform are also available.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.0141

Country:

North America > Canada > Alberta (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
(13 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Data Science > Data Mining > Text Mining (0.84)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

UzbekTagger: The rule-based POS tagger for Uzbek language

Sharipov, Maksud, Kuriyozov, Elmurod, Yuldashev, Ollabergan, Sobirov, Ogabek

arXiv.org Artificial IntelligenceMar-1-2023

This research paper presents a part-of-speech (POS) annotated dataset and tagger tool for the low-resource Uzbek language. The dataset includes 12 tags, which were used to develop a rule-based POS-tagger tool. The corpus text used in the annotation process was made sure to be balanced over 20 different fields in order to ensure its representativeness. Uzbek being an agglutinative language so the most of the words in an Uzbek sentence are formed by adding suffixes. This nature of it makes the POS-tagging task difficult to find the stems of words and the right part-of-speech they belong to. The methodology proposed in this research is the stemming of the words with an affix/suffix stripping approach including database of the stem forms of the words in the Uzbek language. The tagger tool was tested on the annotated dataset and showed high accuracy in identifying and tagging parts of speech in Uzbek text. This newly presented dataset and tagger tool can be used for a variety of natural language processing tasks such as language modeling, machine translation, and text-to-speech synthesis. The presented dataset is the first of its kind to be made publicly available for Uzbek, and the POS-tagger tool created can also be used as a pivot to use as a base for other closely-related Turkic languages.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2301.12711

Country:

Asia > Uzbekistan (0.05)
Europe > Spain (0.04)
Asia > India (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Binding Language Models in Symbolic Languages

Cheng, Zhoujun, Xie, Tianbao, Shi, Peng, Li, Chengzu, Nadkarni, Rahul, Hu, Yushi, Xiong, Caiming, Radev, Dragomir, Ostendorf, Mari, Zettlemoyer, Luke, Smith, Noah A., Yu, Tao

arXiv.org Artificial IntelligenceFeb-28-2023

Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training. Our code is available at https://github.com/HKUNLP/Binder .

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.02875

Country:

North America > United States > Kansas > Douglas County > Lawrence (0.04)
North America > United States > Kansas > Riley County > Manhattan (0.04)
North America > United States > Oklahoma > Oklahoma County > Oklahoma City (0.04)
(19 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Is Japanese CCGBank empirically correct? A case study of passive and causative constructions

Bekki, Daisuke, Yanaka, Hitomi

arXiv.org Artificial IntelligenceFeb-28-2023

The Japanese CCGBank serves as training and evaluation data for developing Japanese CCG parsers. However, since it is automatically generated from the Kyoto Corpus, a dependency treebank, its linguistic validity still needs to be sufficiently verified. In this paper, we focus on the analysis of passive/causative constructions in the Japanese CCGBank and show that, together with the compositional semantics of ccg2lambda, a semantic parsing system, it yields empirically wrong predictions for the nested construction of passives and causatives.

artificial intelligence, natural language, semantic representation, (12 more...)

arXiv.org Artificial Intelligence

2302.14708

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.25)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Interactive Log Parsing via Light-weight User Feedback

Wang, Liming, Xie, Hong, Li, Ye, Tan, Jian, Lui, John C. S.

arXiv.org Artificial IntelligenceFeb-27-2023

Template mining is one of the foundational tasks to support log analysis, which supports the diagnosis and troubleshooting of large scale Web applications. This paper develops a human-in-the-loop template mining framework to support interactive log analysis, which is highly desirable in real-world diagnosis or troubleshooting of Web applications but yet previous template mining algorithms fails to support it. We formulate three types of light-weight user feedbacks and based on them we design three atomic human-in-the-loop template mining algorithms. We derive mild conditions under which the outputs of our proposed algorithms are provably correct. We also derive upper bounds on the computational complexity and query complexity of each algorithm. We demonstrate the versatility of our proposed algorithms by combining them to improve the template mining accuracy of five representative algorithms over sixteen widely used benchmark datasets.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.12225

Country:

North America > United States > Texas > Travis County > Austin (0.05)
Asia > China > Chongqing Province > Chongqing (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Compositional Law Parsing with Latent Random Functions

Shi, Fan, Li, Bin, Xue, Xiangyang

arXiv.org Artificial IntelligenceFeb-25-2023

Human cognition has compositionality. We understand a scene by decomposing the scene into different concepts (e.g., shape and position of an object) and learning the respective laws of these concepts, which may be either natural (e.g., laws of motion) or man-made (e.g., laws of a game). The automatic parsing of these laws indicates the model's ability to understand the scene, which makes law parsing play a central role in many visual tasks. This paper proposes a deep latent variable model for Compositional LAw Parsing (CLAP), which achieves the human-like compositionality ability through an encoding-decoding architecture to represent concepts in the scene as latent variables. CLAP employs concept-specific latent random functions instantiated with Neural Processes to capture the law of concepts. Our experimental results demonstrate that CLAP outperforms the baseline methods in multiple visual tasks such as intuitive physics, abstract visual reasoning, and scene representation. The law manipulation experiments illustrate CLAP's interpretability by modifying specific latent random functions on samples. For example, CLAP learns the laws of position-changing and appearance constancy from the moving balls in a scene, making it possible to exchange laws between samples or compose existing laws into novel laws.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2209.09115

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Law (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback

Resources for Turkish Natural Language Processing: A critical survey

Çöltekin, Çağrı, Doğruöz, A. Seza, Çetinoğlu, Özlem

arXiv.org Artificial IntelligenceFeb-25-2023

The recent (re)popularization of deep learning methods increased the importance and need for the data even further. Similarly, the other subfields of theoretical and applied linguistics have also seen a shift towards more data-driven methods. As a result, availability of large and high-quality language data is essential for both linguistic research and practical NLP applications. In this paper, we present a comprehensive and critical survey of linguistic resources for Turkish.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10579-022-09605-4

2204.05042

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(49 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.45)

Industry:

Media > News (1.00)
Education (1.00)
Government (0.67)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(5 more...)

Add feedback

Extracting Victim Counts from Text

Zhong, Mian, Dhuliawala, Shehzaad, Stoehr, Niklas

arXiv.org Artificial IntelligenceFeb-23-2023

Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts is often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely string matching-based approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare regex, dependency parsing, semantic role labeling-based approaches, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate few-shot and out-of-distribution performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2302.12367

Country:

Europe > Austria (0.04)
Asia > Middle East > Syria (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(17 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.74)
Law Enforcement & Public Safety (0.67)
Media > News (0.66)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.55)

Add feedback

Prosodic features improve sentence segmentation and parsing

Nielsen, Elizabeth, Goldwater, Sharon, Steedman, Mark

arXiv.org Artificial IntelligenceFeb-23-2023

Parsing spoken dialogue presents challenges that parsing text does not, including a lack of clear sentence boundaries. We know from previous work that prosody helps in parsing single sentences (Tran et al. 2018), but we want to show the effect of prosody on parsing speech that isn't segmented into sentences. In experiments on the English Switchboard corpus, we find prosody helps our model both with parsing and with accurately identifying sentence boundaries. However, we find that the best-performing parser is not necessarily the parser that produces the best sentence segmentation performance. We suggest that the best parses instead come from modelling sentence boundaries jointly with other constituent boundaries.

artificial intelligence, end-to-end model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.12165

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Conversational Text-to-SQL: An Odyssey into State-of-the-Art and Challenges Ahead

Parthasarathi, Sree Hari Krishnan, Zeng, Lu, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceFeb-21-2023

We Text-to-SQL is an important research topic in semantic parsing adapt the two reranking methods from [16], query plan (QP) and [1, 2, 3, 4, 5, 6, 7]. Spider [3] and CoSQL [5] datasets allow for schema linking (SL), and show that both methods can help improve making progress in complex, cross-domain, single and multi-turn multi-turn text-to-SQL. With accuracy on CoSQL being reported text-to-SQL tasks respectively, utilizing a common set of databases, using exact-set-match accuracy (EM) and execution accuracy (EX), with competitive leaderboards, demonstrating the difficulty in the with T5-Large we observed: a) MT leads to 2.4% and 1.7% absolute tasks. In contrast to Spider, CoSQL was collected as entire dialogues, improvement on EM and EX; b) combined reranking approaches and hence includes additional challenges for the text-to-SQL yield 1.9% and 2.2% improvements; c) combining MT with reranking, task in terms of integrating dialogue context. In addition to the with T5-Large we obtain improvements of 2.1% in EM and challenges in general-purpose code generation [8, 9], where the 3.7% in EX over a T5-Large PICARD baseline. This improvement output of the system is constrained to follow a grammar, the textto-SQL is consistent on larger models, using T5-3B yielded about 1.0% in problem is underspecified without a schema.

artificial intelligence, cosql, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.11054

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.75)

Add feedback