Goto

Collaborating Authors

 nlp application


NERCat: Fine-Tuning for Enhanced Named Entity Recognition in Catalan

Ferreres, Guillem Cadevall, Sanz, Marc Serrano, Gámez, Marc Bardeli, Basullas, Pol Gerdt, Ruiz, Francesc Tarres, Ferrero, Raul Quijada

arXiv.org Artificial Intelligence

Named Entity Recognition (NER) is a critical component of Natural Language Processing (NLP) for extracting structured information from unstructured text. However, for low-resource languages like Catalan, the performance of NER systems often suffers due to the lack of high-quality annotated datasets. This paper introduces NERCat, a fine-tuned version of the GLiNER[1] model, designed to improve NER performance specifically for Catalan text. We used a dataset of manually annotated Catalan television transcriptions to train and fine-tune the model, focusing on domains such as politics, sports, and culture. The evaluation results show significant improvements in precision, recall, and F1-score, particularly for underrepresented named entity categories such as Law, Product, and Facility. This study demonstrates the effectiveness of domain-specific fine-tuning in low-resource languages and highlights the potential for enhancing Catalan NLP applications through manual annotation and high-quality datasets.


Review for NeurIPS paper: Strongly Incremental Constituency Parsing with Graph Neural Networks

Neural Information Processing Systems

This is a borderline paper. The technical contribution is interesting and appreciated by the reviewers. The results match the state of the art on PTB and are better on CTB. There are, however, some concerns with the paper. One of the reviewers summarized it very well: "In its present form, the scope of the paper seems too narrow. It is also somewhat unclear whom the intended audience ought to be. If the work aims to say something about psycholinguistics, the experiment should reflect that. If the work's goal is to support NLP applications, further justifications and motivations should be provided as to how a strongly incremental constituency parser might be useful in a current NLP pipeline. If the work aims to shed lights on our understanding of GNN, the paper would need to be refocused accordingly."


Leveraging AI and NLP for Bank Marketing: A Systematic Review and Gap Analysis

Gerling, Christopher, Lessmann, Stefan

arXiv.org Artificial Intelligence

This paper explores the growing impact of AI and NLP in bank marketing, highlighting their evolving roles in enhancing marketing strategies, improving customer engagement, and creating value within this sector. While AI and NLP have been widely studied in general marketing, there is a notable gap in understanding their specific applications and potential within the banking sector. This research addresses this specific gap by providing a systematic review and strategic analysis of AI and NLP applications in bank marketing, focusing on their integration across the customer journey and operational excellence. Employing the PRISMA methodology, this study systematically reviews existing literature to assess the current landscape of AI and NLP in bank marketing. Additionally, it incorporates semantic mapping using Sentence Transformers and UMAP for strategic gap analysis to identify underexplored areas and opportunities for future research. The systematic review reveals limited research specifically focused on NLP applications in bank marketing. The strategic gap analysis identifies key areas where NLP can further enhance marketing strategies, including customer-centric applications like acquisition, retention, and personalized engagement, offering valuable insights for both academic research and practical implementation. This research contributes to the field of bank marketing by mapping the current state of AI and NLP applications and identifying strategic gaps. The findings provide actionable insights for developing NLP-driven growth and innovation frameworks and highlight the role of NLP in improving operational efficiency and regulatory compliance. This work has broader implications for enhancing customer experience, profitability, and innovation in the banking industry.


Natural Language Processing for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review

Bilal, Muhammad, Hamza, Ameer, Malik, Nadia

arXiv.org Artificial Intelligence

Objective: This review aims to analyze the application of natural language processing (NLP) techniques in cancer research using electronic health records (EHRs) and clinical notes. This review addresses gaps in the existing literature by providing a broader perspective than previous studies focused on specific cancer types or applications. Methods: A comprehensive literature search was conducted using the Scopus database, identifying 94 relevant studies published between 2019 and 2024. Data extraction included study characteristics, cancer types, NLP methodologies, dataset information, performance metrics, challenges, and future directions. Studies were categorized based on cancer types and NLP applications. Results: The results showed a growing trend in NLP applications for cancer research, with breast, lung, and colorectal cancers being the most studied. Information extraction and text classification emerged as predominant NLP tasks. A shift from rule-based to advanced machine learning techniques, particularly transformer-based models, was observed. The Dataset sizes used in existing studies varied widely. Key challenges included the limited generalizability of proposed solutions and the need for improved integration into clinical workflows. Conclusion: NLP techniques show significant potential in analyzing EHRs and clinical notes for cancer research. However, future work should focus on improving model generalizability, enhancing robustness in handling complex clinical language, and expanding applications to understudied cancer types. Integration of NLP tools into clinical practice and addressing ethical considerations remain crucial for utilizing the full potential of NLP in enhancing cancer diagnosis, treatment, and patient outcomes.


Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP

Bakagianni, Juli, Pouli, Kanella, Gavriilidou, Maria, Pavlopoulos, John

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the potential to complement the broader trend towards multilingualism in NLP by providing foundational insights and resources necessary for effectively addressing the linguistic diversity of global communication. However, monolingual NLP surveys are extremely rare in literature. This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. We include a classification of Language Resources (LRs), according to their availability, and datasets, according to their annotation, to highlight publicly-available and machine-actionable LRs. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022, providing a comprehensive overview of the current state and challenges of Greek NLP research. We discuss the progress of Greek NLP and outline encountered Greek LRs, classified by availability and usability. As we show, our proposed method helps avoid common pitfalls, such as data leakage and contamination, and to assess language support per NLP task. We consider this systematic literature review of Greek NLP an application of our method that showcases the benefits of a monolingual NLP survey. Similar applications could be regard the myriads of languages whose progress in NLP lags behind that of well-supported languages.


The Evolution of Darija Open Dataset: Introducing Version 2

Outchakoucht, Aissam, Es-Samaali, Hamza

arXiv.org Artificial Intelligence

Darija Open Dataset (DODa) represents an open-source project aimed at enhancing Natural Language Processing capabilities for the Moroccan dialect, Darija. With approximately 100,000 entries, DODa stands as the largest collaborative project of its kind for Darija-English translation. The dataset features semantic and syntactic categorizations, variations in spelling, verb conjugations across multiple tenses, as well as tens of thousands of translated sentences. The dataset includes entries written in both Latin and Arabic alphabets, reflecting the linguistic variations and preferences found in different sources and applications. The availability of such dataset is critical for developing applications that can accurately understand and generate Darija, thus supporting the linguistic needs of the Moroccan community and potentially extending to similar dialects in neighboring regions. This paper explores the strategic importance of DODa, its current achievements, and the envisioned future enhancements that will continue to promote its use and expansion in the global NLP landscape.


When does MAML Work the Best? An Empirical Study on Model-Agnostic Meta-Learning in NLP Applications

Liu, Zequn, Zhang, Ruiyi, Song, Yiping, Ju, Wei, Zhang, Ming

arXiv.org Artificial Intelligence

Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation. Many impacting factors, including data quantity, similarity among tasks, and the balance between general language model and task-specific adaptation, can affect the performance of MAML in NLP, but few works have thoroughly studied them. In this paper, we conduct an empirical study to investigate these impacting factors and conclude when MAML works the best based on the experimental results.


Generative User-Experience Research for Developing Domain-specific Natural Language Processing Applications

Zhukova, Anastasia, von Sperl, Lukas, Matt, Christian E., Gipp, Bela

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) has been recently extensively incorporated into industrial and domain applications. For example, NLP is used for speeding up processes, e.g., automation classification of types of customer feedback or filtering out spam emails, information extraction, e.g., named entity recognition to extract symptoms, diagnoses, and treatments from medical records, or auto-completing input forms with language models. Despite the broad integration, domain-specific NLP applications may require practicing more user-driven methodologies to address user needs with these applications. Often, the data-driven approach falls short in exploring the needs of the domain users (Yang, 2018). On the one hand, domain users are often integrated into development at the late test phase to evaluate the usability of ML/NLP applications (Carney, 2019). Unlike user-driven software development, the development of NLP applications depends mainly on data availability or experimenting with machine learning (ML)/NLP trends and thus is a major driver of application development. On the other hand, the user-driven development of a domain-specific ML/NLP application in medicine showed that close collaboration with the domain users in the earlier stages increases the effectiveness of the final product (Yang, 2017). Therefore, integrating user experience (UX) and human-computer interaction (HCI) research into ML/NLP research addresses users' needs, fuses their expertise, and increases intuitiveness, transparency, simplicity, and trust for the system users (Boukhelifa et al, 2018; Paleyes et al, 2022).


Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach

Chaturvedi, Jaya, Velupillai, Sumithra, Stewart, Robert, Roberts, Angus

arXiv.org Artificial Intelligence

Pain is a common reason for accessing healthcare resources and is a growing area of research, especially in its overlap with mental health. Mental health electronic health records are a good data source to study this overlap. However, much information on pain is held in the free text of these records, where mentions of pain present a unique natural language processing problem due to its ambiguous nature. This project uses data from an anonymised mental health electronic health records database. The data are used to train a machine learning based classification algorithm to classify sentences as discussing patient pain or not. This will facilitate the extraction of relevant pain information from large databases, and the use of such outputs for further studies on pain and mental health. 1,985 documents were manually triple-annotated for creation of gold standard training data, which was used to train three commonly used classification algorithms. The best performing model achieved an F1-score of 0.98 (95% CI 0.98-0.99).


AI in Natural Language Processing - A Complete Guide

#artificialintelligence

Artificial Intelligence (AI) has revolutionized the field of Natural Language Processing (NLP) by enabling computers to understand, analyze, and generate human language. NLP involves the use of computational techniques to process and manipulate natural language data, such as text and speech. AI algorithms, including Machine Learning (ML) and Deep Learning (DL), provide the foundation for NLP by enabling computers to learn from large amounts of language data and identify patterns and relationships in language that are difficult for humans to detect. It is clear that the role of AI in NLP is vital and is expected to continue driving progress in this field. As an artificial intelligence development company, we are at the forefront of this technology and are excited about the possibilities it offers for the future of human-computer interaction.