AITopics

2312.05448

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.33)

Passali, Tatiana, Chatzikyriakidis, Efstathios, Andreadis, Stelios, Stavropoulos, Thanos G., Matonaki, Anastasia, Fachantidis, Anestis, Tsoumakas, Grigorios

From Lengthy to Lucid: A Systematic Literature Review on NLP Techniques for Taming Long Sentences

arXiv.org Artificial IntelligenceDec-8-2023

Long sentences have been a persistent issue in written communication for many years since they make it challenging for readers to grasp the main points or follow the initial intention of the writer. This survey, conducted using the PRISMA guidelines, systematically reviews two main strategies for addressing the issue of long sentences: a) sentence compression and b) sentence splitting. An increased trend of interest in this area has been observed since 2005, with significant growth after 2017. Current research is dominated by supervised approaches for both sentence compression and splitting. Yet, there is a considerable gap in weakly and self-supervised techniques, suggesting an opportunity for further research, especially in domains with limited data. In this survey, we categorize and group the most representative methods into a comprehensive taxonomy. We also conduct a comparative evaluation analysis of these methods on common sentence compression and splitting datasets. Finally, we discuss the challenges and limitations of current methods, providing valuable insights for future research directions. This survey is meant to serve as a comprehensive resource for addressing the complexities of long sentences. We aim to enable researchers to make further advancements in the field until long sentences are no longer a barrier to effective communication.

compression, computational linguistic, sentence compression, (11 more...)

2312.05172

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > Washington > King County > Seattle (0.14)
(46 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(5 more...)

Phatthiyaphaibun, Wannaphong, Chaovavanich, Korakot, Polpanumas, Charin, Suriyawongkul, Arthit, Lowphansirikul, Lalita, Chormai, Pattarawat, Limkonchotiwat, Peerat, Suntorntip, Thanathip, Udomcharoenchaikit, Can

PyThaiNLP: Thai Natural Language Processing in Python

arXiv.org Artificial IntelligenceDec-7-2023

We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.

dataset, proceedings, pythainlp, (13 more...)

2312.04649

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(17 more...)

Genre: Research Report (0.40)

Industry:

Information Technology (0.68)
Education (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.96)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)

Jain, Parag, Lapata, Mirella

Conversational Semantic Parsing using Dynamic Context Graphs

arXiv.org Artificial IntelligenceDec-7-2023

In this paper we consider the task of conversational semantic parsing over general purpose knowledge graphs (KGs) with millions of entities, and thousands of relation-types. We focus on models which are capable of interactively mapping user utterances into executable logical forms (e.g., Sparql) in the context of the conversational history. Our key idea is to represent information about an utterance and its context via a subgraph which is created dynamically, i.e., the number of nodes varies per utterance. Rather than treating the subgraph as a sequence, we exploit its underlying structure and encode it with a graph neural network which further allows us to represent a large number of (unseen) nodes. Experimental results show that dynamic context modeling is superior to static approaches, delivering performance improvements across the board (i.e., for simple and complex questions). Our results further confirm that modeling the structure of context is better at processing discourse information, (i.e., at handling ellipsis and resolving coreference) and longer interactions.

computational linguistic, information, utterance, (13 more...)

2305.06164

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong (0.04)
(14 more...)

Genre: Research Report > New Finding (0.68)

Industry: Media (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Themistocleous, Charalambos, Tsapkini, Kyrana, Kokkinakis, Dimitrios

Assessing Language Disorders using Artificial Intelligence: a Paradigm Shift

Speech, language, and communication deficits are present in most neurodegenerative syndromes. They enable the early detection, diagnosis, treatment planning, and monitoring of neurocognitive disease progression as part of traditional neurological assessment. Nevertheless, standard speech and language evaluation is time-consuming and resource-intensive for clinicians. We argue that using machine learning methodologies, natural language processing, and modern artificial intelligence (AI) for Language Assessment is an improvement over conventional manual assessment. Using these methodologies, Computational Language Assessment (CLA) accomplishes three goals: (i) provides a neuro-cognitive evaluation of speech, language, and communication in elderly and high-risk individuals for dementia; (ii) facilitates the diagnosis, prognosis, and therapy efficacy in at-risk and language-impaired populations; and (iii) allows easier extensibility to assess patients from a wide range of languages. By employing AI models, CLA may inform neurocognitive theory on the relationship between language symptoms and their neural bases. Finally, it signals a paradigm shift by significantly advancing our ability to optimize the prevention and treatment of elderly individuals with communication disorders, allowing them to age gracefully with social engagement.

assessment, impairment, themistocleous, (15 more...)

2305.20046

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.70)
Health & Medicine > Therapeutic Area > Neurology > Dementia (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Groschwitz, Jonas, Cohen, Shay B., Donatelli, Lucia, Fowlie, Meaghan

AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite

We present the Granular AMR Parsing Evaluation Suite (GrAPES), a challenge set for Abstract Meaning Representation (AMR) parsing with accompanying evaluation metrics. AMR parsers now obtain high scores on the standard AMR evaluation metric Smatch, close to or even above reported inter-annotator agreement. But that does not mean that AMR parsing is solved; in fact, human evaluation in previous work indicates that current parsers still quite frequently make errors on node labels or graph structure that substantially distort sentence meaning. Here, we provide an evaluation suite that tests AMR parsers on a range of phenomena of practical, technical, and linguistic interest. Our 36 categories range from seen and unseen labels, to structural generalization, to coreference. GrAPES reveals in depth the abilities and shortcomings of current AMR parsers.

category, parser, reentrancy, (13 more...)

2312.0348

Country:

Asia > North Korea (0.28)
North America > Haiti (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(13 more...)

Genre: Research Report (0.82)

Industry: Government (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Tanaka-Ishii, Kumiko, Tanaka, Akira

Strahler Number of Natural Language Sentences in Comparison with Random Trees

The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost 3 or 4, similarly to the case of river bifurcation as reported by Strahler (1957). From the theory behind the number, we show that it is one kind of lower limit on the amount of memory required to process sentences. We consider the Strahler number to provide reasoning that explains reports showing that the number of required memory areas to process sentences is 3 to 4 for parsing (Schuler et al., 2010), and reports indicating a psychological "magical number" of 3 to 5 (Cowan, 2001). An analytical and empirical analysis shows that the Strahler number is not constant but grows logarithmically; therefore, the Strahler number of sentences derives from the range of sentence lengths. Furthermore, the Strahler number is not different for random trees, which could suggest that its origin is not specific to natural language.

lower limit, sentence structure, strahler number, (15 more...)

2307.02697

Country:

Europe > Hungary > Csongrád-Csanád County > Szeged (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(9 more...)

Genre:

Research Report (0.82)
Overview (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Error Detection for Text-to-SQL Semantic Parsing

Chen, Shijie, Chen, Ziru, Sun, Huan, Su, Yu

Despite remarkable progress in text-to-SQL semantic parsing in recent years, the performance of existing parsers is still far from perfect. Specifically, modern text-to-SQL parsers based on deep learning are often over-confident, thus casting doubt on their trustworthiness when deployed for real use. In this paper, we propose a parser-independent error detection model for text-to-SQL semantic parsing. Using a language model of code as its bedrock, we enhance our error detection model with graph neural networks that learn structural features of both natural language questions and SQL queries. We train our model on realistic parsing errors collected from a cross-domain setting, which leads to stronger generalization ability. Experiments with three strong text-to-SQL parsers featuring different decoding mechanisms show that our approach outperforms parser-dependent uncertainty metrics. Our model could also effectively improve the performance and usability of text-to-SQL semantic parsers regardless of their architectures. (Our implementation is available at https://github.com/OSU-NLP-Group/Text2SQL-Error-Detection)

computational linguistic, parser, prediction, (11 more...)

2305.13683

Country:

North America > United States > Ohio (0.04)
Asia > China > Hong Kong (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Coupette, Corinna, Vreeken, Jilles, Rieck, Bastian

All the World's a (Hyper)Graph: A Data Drama

We introduce Hyperbard, a dataset of diverse relational data representations derived from Shakespeare's plays. Our representations range from simple graphs capturing character co-occurrence in single scenes to hypergraphs encoding complex communication settings and character contributions as hyperedges with edge-specific node weights. By making multiple intuitive representations readily available for experimentation, we facilitate rigorous representation robustness checks in graph learning, graph mining, and network analysis, highlighting the advantages and drawbacks of specific representations. Leveraging the data released in Hyperbard, we demonstrate that many solutions to popular graph mining problems are highly dependent on the representation choice, thus calling current graph curation practices into question. As an homage to our data source, and asserting that science can also be art, we present all our points in the form of a play.

capulet, representation, servant, (16 more...)

doi: 10.1093/llc/fqad071

2206.08225

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
(2 more...)

Genre: Research Report (0.63)

Industry:

Law (0.67)
Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (0.88)
Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Sharma, Ashwyn, Feldman, David I., Jain, Aneesh

Fine-tuning pre-trained extractive QA models for clinical document parsing

arXiv.org Artificial IntelligenceDec-4-2023

Electronic health records (EHRs) contain a vast amount of high-dimensional multi-modal data that can accurately represent a patient's medical history. Unfortunately, most of this data is either unstructured or semi-structured, rendering it unsuitable for real-time and retrospective analyses. A remote patient monitoring (RPM) program for Heart Failure (HF) patients needs to have access to clinical markers like EF (Ejection Fraction) or LVEF (Left Ventricular Ejection Fraction) in order to ascertain eligibility and appropriateness for the program. This paper explains a system that can parse echocardiogram reports and verify EF values. This system helps identify eligible HF patients who can be enrolled in such a program. At the heart of this system is a pre-trained extractive QA transformer model that is fine-tuned on custom-labeled data. The methods used to prepare such a model for deployment are illustrated by running experiments on a public clinical dataset like MIMIC-IV-Note. The pipeline can be used to generalize solutions to similar problems in a low-resource setting. We found that the system saved over 1500 hours for our clinicians over 12 months by automating the task at scale.

dataset, pipeline, qa model, (12 more...)

2312.02314

Country:

North America > United States > Virginia (0.04)
North America > United States > Massachusetts (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.41)