nlp tool
Fit for our purpose, not yours: Benchmark for a low-resource, Indigenous language
Influential and popular benchmarks in AI are largely irrelevant to developing NLP tools for low-resource, Indigenous languages. With the primary goal of measuring the performance of general-purpose AI systems, these benchmarks fail to give due consideration and care to individual language communities, especially low-resource languages. The datasets contain numerous grammatical and orthographic errors, poor pronunciation, limited vocabulary, and the content lacks cultural relevance to the language community. To overcome the issues with these benchmarks, we have created a dataset for te reo Māori (the Indigenous language of Aotearoa/New Zealand) to pursue NLP tools that are'fit-for-our-purpose'. This paper demonstrates how low-resourced, Indigenous languages can develop tailored, high-quality benchmarks that; i. Consider the impact of colonisation on their language; ii.
Fit for our purpose, not yours: Benchmark for a low-resource, Indigenous language
Influential and popular benchmarks in AI are largely irrelevant to developing NLP tools for low-resource, Indigenous languages. With the primary goal of measuring the performance of general-purpose AI systems, these benchmarks fail to give due consideration and care to individual language communities, especially low-resource languages. The datasets contain numerous grammatical and orthographic errors, poor pronunciation, limited vocabulary, and the content lacks cultural relevance to the language community. To overcome the issues with these benchmarks, we have created a dataset for te reo Māori (the Indigenous language of Aotearoa/New Zealand) to pursue NLP tools that are'fit-for-our-purpose'. This paper demonstrates how low-resourced, Indigenous languages can develop tailored, high-quality benchmarks that; i.
Thoughtful Adoption of NLP for Civic Participation: Understanding Differences Among Policymakers
Guridi, Jose A., Cheyre, Cristobal, Yang, Qian
Natural language processing (NLP) tools have the potential to boost civic participation and enhance democratic processes because they can significantly increase governments' capacity to gather and analyze citizen opinions. However, their adoption in government remains limited, and harnessing their benefits while preventing unintended consequences remains a challenge. While prior work has focused on improving NLP performance, this work examines how different internal government stakeholders influence NLP tools' thoughtful adoption. We interviewed seven politicians (politically appointed officials as heads of government institutions) and thirteen public servants (career government employees who design and administrate policy interventions), inquiring how they choose whether and how to use NLP tools to support civic participation processes. The interviews suggest that policymakers across both groups focused on their needs for career advancement and the need to showcase the legitimacy and fairness of their work when considering NLP tool adoption and use. Because these needs vary between politicians and public servants, their preferred NLP features and tool designs also differ. Interestingly, despite their differing needs and opinions, neither group clearly identifies who should advocate for NLP adoption to enhance civic participation or address the unintended consequences of a poorly considered adoption. This lack of clarity in responsibility might have caused the governments' low adoption of NLP tools. We discuss how these findings reveal new insights for future HCI research. They inform the design of NLP tools for increasing civic participation efficiency and capacity, the design of other tools and methods that ensure thoughtful adoption of AI tools in government, and the design of NLP tools for collaborative use among users with different incentives and needs.
CNER: A tool Classifier of Named-Entity Relationships
Torres, Jefferson A. Peña, De Piñerez, Raúl E. Gutiérrez
However, Spanish is occasionally adopted as the focus language for research endeavors and as result multiple projects are conducted in Spanish to explore language-specific nuances and challenges in NLP applications. Named-Entity recognition [1], Machine Translation [2], Semantic Relation Extraction [3] among others tasks have been conducted with a focus on Spanish language data, allowing for a more nuanced understanding of the intricacies involved. In this paper we present Classifier for Named Entities Recognized (CNER) a linguistically-aware online service that offers the possibility to test two main tasks of NLP, Named Entity Recognition (NER) and Relation Extraction (RE) for Spanish language. This together with other projects on Spanish language have been evaluated and adapted as a web service. In this context, language technologies and natural language processing (NLP) tools can support the identification of useful information in text and to promote its understanding. Specifically, CNER i) identifies the mentions follow the ACE standard with entity types include Person (PER), Organisation (ORG), Facility (FAC), Location (LOC), Geographical/Political (GPE), Vehicle (VEH), Vehicle (VEH) and Weapon (WEA) [4], [5]; ii) displays three different NER tools as previous step to RE task and iii) offers entity relationship information through tags GPE-AFF, PHYS, DISC, EMP-ORG, ART, NON-REL representing the relations between two entities [6] .
NLP for Maternal Healthcare: Perspectives and Guiding Principles in the Age of LLMs
Antoniak, Maria, Naik, Aakanksha, Alvarado, Carla S., Wang, Lucy Lu, Chen, Irene Y.
Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Drawing directly from the voices of those most affected, and focusing on a case study of a specific healthcare setting, we propose a set of guiding principles for the use of NLP in maternal healthcare. We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally surveyed 30 healthcare workers and 30 birthing people about their values, needs, and perceptions of NLP tools in the context of maternal health. We conducted quantitative and qualitative analyses of the survey results and interactive discussions to consolidate our findings into a set of guiding principles. We propose nine principles for ethical use of NLP for maternal healthcare, grouped into three themes: (i) recognizing contextual significance (ii) holistic measurements, and (iii) who/what is valued. For each principle, we describe its underlying rationale and provide practical advice. This set of principles can provide a methodological pattern for other researchers and serve as a resource to practitioners working on maternal health and other healthcare fields to emphasize the importance of technical nuance, historical context, and inclusive design when developing NLP technologies for clinical use.
A Comparison of Veterans with Problematic Opioid Use Identified through Natural Language Processing of Clinical Notes versus Using Diagnostic Codes
Workman, Terri Elizabeth, Kupersmith, Joel, Ma, Phillip, Spevak, Christopher, Sandbrink, Friedhelm, Zeng-Treitler, Yan Cheng Qing
Background: Electronic health records (EHRs) are a data source for opioid research. Opioid use disorder is known to be under-coded as a diagnosis, yet problematic opioid use can be documented in clinical notes. Objectives: Our goals were 1) to identify problematic opioid use from a full range of clinical notes; and 2) to compare the characteristics of patients identified as having problematic opioid use, exclusively documented in clinical notes, to those having documented ICD opioid use disorder diagnostic codes. Materials and Methods: We developed and applied a natural language processing (NLP) tool to the clinical notes of a patient cohort (n=222,371) from two Veteran Affairs service regions to identify patients with problematic opioid use. We also used a set of ICD diagnostic codes to identify patients with opioid use disorder from the same cohort. We compared the demographic and clinical characteristics of patients identified only through NLP, to those of patients identified through ICD codes. Results: NLP exclusively identified 57,331 patients; 6,997 patients had positive ICD code identifications. Patients exclusively identified through NLP were more likely to be women. Those identified through ICD codes were more likely to be male, younger, have concurrent benzodiazepine prescriptions, more comorbidities, more care encounters, and less likely to be married. Patients in the NLP and ICD groups had substantially elevated comorbidity levels compared to patients not documented as experiencing problematic opioid use. Conclusions: NLP is a feasible approach for identifying problematic opioid use not otherwise recorded by ICD codes. Clinicians may be reluctant to code for opioid use disorder. It is therefore incumbent on the healthcare team to search for documentation of opioid concerns within clinical notes.
Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data
Kaufmann, Basil, Busby, Dallin, Das, Chandan Krushna, Tillu, Neeraja, Menon, Mani, Tewari, Ashutosh K., Gorin, Michael A.
Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. The tool was assessed for superiority for data abstraction speed and non-inferiority for accuracy. Results: The human abstractors required a mean of 101s per report for data abstraction, with times varying from 15 to 284 s. In comparison, the software tool required a mean of 12.8 s to process the vectorized reports and a mean of 15.8 to process the scanned reports (P < 0.001). The overall accuracies of the three human abstractors were 94.7%, 97.8%, and 96.4% for the combined set of 2786 datapoints. The software tool had an overall accuracy of 94.2% for the vectorized reports, proving to be non-inferior to the human abstractors at a margin of -10% ($\alpha$=0.025). The tool had a slightly lower accuracy of 88.7% using the scanned reports, proving to be non-inferiority to 2 out of 3 human abstractors. Conclusion: The developed zero-shot learning NLP tool affords researchers comparable levels of accuracy to that of human abstractors, with significant time savings benefits. Because of the lack of need for task-specific model training, the developed tool is highly generalizable and can be used for a wide variety of data abstraction tasks, even outside the field of medicine.
Examining risks of racial biases in NLP tools for child protective services
Field, Anjalie, Coston, Amanda, Gandhi, Nupoor, Chouldechova, Alexandra, Putnam-Hornstein, Emily, Steier, David, Tsvetkov, Yulia
Although much literature has established the presence of demographic bias in natural language processing (NLP) models, most work relies on curated bias metrics that may not be reflective of real-world applications. At the same time, practitioners are increasingly using algorithmic tools in high-stakes settings, with particular recent interest in NLP. In this work, we focus on one such setting: child protective services (CPS). CPS workers often write copious free-form text notes about families they are working with, and CPS agencies are actively seeking to deploy NLP models to leverage these data. Given well-established racial bias in this setting, we investigate possible ways deployed NLP is liable to increase racial disparities. We specifically examine word statistics within notes and algorithmic fairness in risk prediction, coreference resolution, and named entity recognition (NER). We document consistent algorithmic unfairness in NER models, possible algorithmic unfairness in coreference resolution models, and little evidence of exacerbated racial bias in risk prediction. While there is existing pronounced criticism of risk prediction, our results expose previously undocumented risks of racial bias in realistic information extraction systems, highlighting potential concerns in deploying them, even though they may appear more benign. Our work serves as a rare realistic examination of NLP algorithmic fairness in a potential deployed setting and a timely investigation of a specific risk associated with deploying NLP in CPS settings.
5 examples of effective NLP in customer service
The study of natural language processing has been around for more than 50 years, but only recently has it reached the level of accuracy needed to provide real value. From interactive chatbots that can automatically respond to human requests to voice assistants used in our daily life, the power of AI-enabled natural language processing (NLP) is improving the interactions between humans and machines. NLP is broadly defined as the automatic manipulation of natural language, either in speech or text form, by software. NLP-enabled systems aim to understand human speech and typed language, interpret it in a form that machines can process, and respond back using human language forms rather than code. AI systems have greatly improved the accuracy and flexibility of NLP systems, enabling machines to communicate in hundreds of languages and across different application domains.