AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

Structural Transfer Learning in NL-to-Bash Semantic Parsers

Duffy, Kyle, Bhattamishra, Satwik, Blunsom, Phil

arXiv.org Artificial IntelligenceJul-31-2023

Large-scale pre-training has made progress in many fields of natural language processing, though little is understood about the design of pre-training datasets. We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. We apply our methodology to the natural language to Bash semantic parsing task (NLBash) and show that it is largely reducible to lexical alignment. We also find that there is strong structural overlap between NLBash and natural language to SQL. Additionally, we perform a study varying compute expended during pre-training on the English to German machine translation task and find that more compute expended during pre-training does not always correspond semantic representations with stronger transfer to NLBash.

accuracy, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2307.16795

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.16)
Europe > Portugal > Lisbon > Lisbon (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

Chen, Yuanhao

arXiv.org Artificial IntelligenceJul-30-2023

Tone is a crucial component of the prosody of Shanghainese, a Wu Chinese variety spoken primarily in urban Shanghai. Tone sandhi, which applies to all multi-syllabic words in Shanghainese, then, is key to natural-sounding speech. Unfortunately, recent work on Shanghainese TTS (text-to-speech) such as Apple's VoiceOver has shown poor performance with tone sandhi, especially LD (left-dominant sandhi). Here I show that word segmentation during text preprocessing can improve the quality of tone sandhi production in TTS models. Syllables within the same word are annotated with a special symbol, which serves as a proxy for prosodic information of the domain of LD. Contrary to the common practice of using prosodic annotation mainly for static pauses, this paper demonstrates that prosodic annotation can also be applied to dynamic tonal phenomena. I anticipate this project to be a starting point for bringing formal linguistic accounts of Shanghainese into computational projects. Too long have we been using the Mandarin models to approximate Shanghainese, but it is a different language with its own linguistic features, and its digitisation and revitalisation should be treated as such.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2307.16199

Country:

Asia > China > Shanghai > Shanghai (0.27)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Ohio (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.63)

Add feedback

CodeLens: An Interactive Tool for Visualizing Code Representations

Guo, Yuejun, Bettaieb, Seifeddine, Hu, Qiang, Traon, Yves Le, Tang, Qiang

arXiv.org Artificial IntelligenceJul-27-2023

Representing source code in a generic input format is crucial to automate software engineering tasks, e.g., applying machine learning algorithms to extract information. Visualizing code representations can further enable human experts to gain an intuitive insight into the code. Unfortunately, as of today, there is no universal tool that can simultaneously visualise different types of code representations. In this paper, we introduce a tool, CodeLens, which provides a visual interaction environment that supports various representation methods and helps developers understand and explore them. CodeLens is designed to support multiple programming languages, such as Java, Python, and JavaScript, and four types of code representations, including sequence of tokens, abstract syntax tree (AST), data flow graph (DFG), and control flow graph (CFG). By using CodeLens, developers can quickly visualize the specific code representation and also obtain the represented inputs for models of code. The Web-based interface of CodeLens is available at http://www.codelens.org. The demonstration video can be found at http://www.codelens.org/demo.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.14902

Country:

Europe (0.17)
North America > United States > New York (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)

Add feedback

Improving Aspect-Based Sentiment with End-to-End Semantic Role Labeling Model

Přibáň, Pavel, Pražák, Ondřej

arXiv.org Artificial IntelligenceJul-27-2023

This paper presents a series of approaches aimed at enhancing the performance of Aspect-Based Sentiment Analysis (ABSA) by utilizing extracted semantic information from a Semantic Role Labeling (SRL) model. We propose a novel end-to-end Semantic Role Labeling model that effectively captures most of the structured semantic information within the Transformer hidden state. We believe that this end-to-end model is well-suited for our newly proposed models that incorporate semantic information. We evaluate the proposed models in two languages, English and Czech, employing ELECTRA-small models. Our combined models improve ABSA performance in both languages. Moreover, we achieved new state-of-the-art results on the Czech ABSA.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.14785

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(12 more...)

Genre: Research Report (0.64)

Industry:

Consumer Products & Services > Restaurants (0.70)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Mining Clues from Incomplete Utterance: A Query-enhanced Network for Incomplete Utterance Rewriting

Si, Shuzheng, Zeng, Shuang, Chang, Baobao

arXiv.org Artificial IntelligenceJul-27-2023

Incomplete utterance rewriting has recently raised wide attention. However, previous works do not consider the semantic structural information between incomplete utterance and rewritten utterance or model the semantic structure implicitly and insufficiently. To address this problem, we propose a QUEry-Enhanced Network (QUEEN). Firstly, our proposed query template explicitly brings guided semantic structural knowledge between the incomplete utterance and the rewritten utterance making model perceive where to refer back to or recover omitted tokens. Then, we adopt a fast and effective edit operation scoring network to model the relation between two tokens. Benefiting from proposed query template and the well-designed edit operation scoring network, QUEEN achieves state-of-the-art performance on several public datasets.

machine learning, natural language, utterance, (14 more...)

arXiv.org Artificial Intelligence

2307.00866

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Shaanxi Province > Xi'an (0.05)
Asia > China > Hong Kong (0.04)
(10 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Contributions to the Improvement of Question Answering Systems in the Biomedical Domain

Sarrouti, Mourad

arXiv.org Artificial IntelligenceJul-25-2023

This thesis work falls within the framework of question answering (QA) in the biomedical domain where several specific challenges are addressed, such as specialized lexicons and terminologies, the types of treated questions, and the characteristics of targeted documents. We are particularly interested in studying and improving methods that aim at finding accurate and short answers to biomedical natural language questions from a large scale of biomedical textual documents in English. QA aims at providing inquirers with direct, short and precise answers to their natural language questions. In this Ph.D. thesis, we propose four contributions to improve the performance of QA in the biomedical domain. In our first contribution, we propose a machine learning-based method for question type classification to determine the types of given questions which enable to a biomedical QA system to use the appropriate answer extraction method. We also propose an another machine learning-based method to assign one or more topics (e.g., pharmacological, test, treatment, etc.) to given questions in order to determine the semantic types of the expected answers which are very useful in generating specific answer retrieval strategies. In the second contribution, we first propose a document retrieval method to retrieve a set of relevant documents that are likely to contain the answers to biomedical questions from the MEDLINE database. We then present a passage retrieval method to retrieve a set of relevant passages to questions. In the third contribution, we propose specific answer extraction methods to generate both exact and ideal answers. Finally, in the fourth contribution, we develop a fully automated semantic biomedical QA system called SemBioNLQA which is able to deal with a variety of natural language questions and to generate appropriate answers by providing both exact and ideal answers.

information retrieval, machine learning, question answering, (25 more...)

arXiv.org Artificial Intelligence

2307.13631

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(15 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
(9 more...)

Add feedback

Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm

Deng, Hexuan, Zhang, Xin, Zhang, Meishan, Liu, Xuebo, Zhang, Min

arXiv.org Artificial IntelligenceJul-25-2023

A long-standing objective in the fields of natural Secondly, while Stengel-Eskin et al. language understanding and computational semantics (2020) have tried to utilize the external tool Pred-is to create a structured graph of linguistic Patt (Zhang et al., 2017), which contains the relationship meaning. Various efforts have been made to between syntax and semantic information, encode semantic relations and attributes into a semantic they do not achieve any improvements. In graph--e.g., Abstract Meaning Representation contrast, we propose a data augmentation method (AMR; Banarescu et al., 2013), Universal that effectively exploits the capabilities of Pred-Conceptual Cognitive Annotation (UCCA; Abend Patt, leading to significant performance gains in and Rappoport, 2013), and Semantic Dependency relation parsing. Moreover, we have explored various Parsing formalisms (SDP; Oepen et al., 2014, approaches for these enhancements, providing 2016).

information, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.13424

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Lai, Viet Dac, Salinas, Abel, Tan, Hao, Bui, Trung, Tran, Quan, Yoon, Seunghyun, Deilamsalehy, Hanieh, Dernoncourt, Franck, Nguyen, Thien Huu

arXiv.org Artificial IntelligenceJul-24-2023

Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration.

machine learning, pr model, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2307.12949

Country:

North America > United States > California (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > Oregon (0.04)

Genre: Research Report (0.50)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.86)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, Purwarianti, Ayu

arXiv.org Artificial IntelligenceJul-21-2023

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2212.09648

Country:

North America > United States > Texas > Dallas County > Dallas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Timor-Leste (0.14)
(64 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Law (0.67)
Government (0.67)
Information Technology > Services (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(5 more...)

Add feedback

Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation

Zheng, Wenqing, Sharan, S P, Jaiswal, Ajay Kumar, Wang, Kevin, Xi, Yihan, Xu, Dejia, Wang, Zhangyang

arXiv.org Artificial IntelligenceJul-18-2023

For a complicated algorithm, its implementation by a human programmer usually starts with outlining a rough control flow followed by iterative enrichments, eventually yielding carefully generated syntactic structures and variables in a hierarchy. However, state-of-the-art large language models generate codes in a single pass, without intermediate warm-ups to reflect the structured thought process of "outline-then-detail". Inspired by the recent success of chain-of-thought prompting, we propose ChainCoder, a program synthesis language model that generates Python code progressively, i.e. from coarse to fine in multiple passes. We first decompose source code into layout frame components and accessory components via abstract syntax tree parsing to construct a hierarchical representation. We then reform our prediction target into a multi-pass objective, each pass generates a subsequence, which is concatenated in the hierarchy. Finally, a tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples. Extensive evaluations show that ChainCoder outperforms state-of-the-arts, demonstrating that our progressive generation eases the reasoning procedure and guides the language model to generate higher-quality solutions. Our codes are available at: https://github.com/VITA-Group/ChainCoder.

chaincoder, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.00909

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(2 more...)

Add feedback