Goto

Collaborating Authors

 nlp technique


Meursault as a Data Point

arXiv.org Artificial Intelligence

Abstract--In an era dominated by datafication, the reduction of human experiences to quantifiable metrics raises profound philosophical and ethical questions. This paper explores these issues through the lens of Meursault, the protagonist of Albert Camus' The Stranger, whose emotionally detached existence epitomizes the existential concept of absurdity. Using natural language processing (NLP) techniques including emotion detection (BERT), sentiment analysis (V ADER), and named entity recognition (spaCy)-this study quantifies key events and behaviors in Meursault's life. Our analysis reveals the inherent limitations of applying algorithmic models to complex human experiences, particularly those rooted in existential alienation and moral ambiguity. By examining how modern AI tools misinterpret Meursault's actions and emotions, this research underscores the broader ethical dilemmas of reducing nuanced human narratives to data points, challenging the foundational assumptions of our data-driven society. The findings presented in this paper serve as a critique of the increasing reliance on data-driven narratives and advocate for incorporating humanistic values in artificial intelligence. In the digital age, the quantification of human experience has become a dominant paradigm, promising objectivity and predictive power [5]. However, this reductionist approach, known as datafication, risks obscuring the complexity and nuance inherent in human existence [6].


InsurTech innovation using natural language processing

arXiv.org Machine Learning

InsurTech refers to the use of state-of-the-art technology, including both emerging hardware and software, to address inefficiencies across the insurance value chain and further explore new opportunities to reshape traditional business operations. InsurTech encompasses a broad spectrum of technology-driven innovations, including, but not limited to, telematics, usage-based insurance, and the integration of Internet of Things (IoT) sensors. In this study, we focus on a specific class of InsurTech, an Insurtech data vendor, that provides insurance companies with next-generation data solutions. We leverage new and diverse external data sources, such as social media data and online content, to enrich the internal database, thereby empowering actuarial analytics and gaining more accurate insights into risk profiles and policyholder behavior. Specifically, by integrating alternative data sources beyond traditional information, insurance companies can uncover previously unrecognized risk factors, reduce bias in existing features, and identify more accurate risk exposures based on the operational characteristics of the insured entities.


Measuring Online Hate on 4chan using Pre-trained Deep Learning Models

arXiv.org Artificial Intelligence

-- Online hate speech can harmfully impact individuals and groups, specifically on non - moderated platforms such as 4chan where users can post anonymous content. This work focuses on analy s ing and measuring the prevalence of online hat e on 4chan's politically incorrect board (/pol/) using state - of - the - art Natural Language Processing (NLP) models, specifically transformer - based models such as RoBERTa and Detoxify . By leveraging these advanced models, we provide an in - depth analysis of hate speech dynamics and quantify the extent of online hate non - moderated platforms. The study advances understanding through multi - class classification of hate speech (racism, sexism, religion, etc.), while also incorporating the classification of toxic content (e.g., identity attacks and threats) and a further topic modelling analysis. The results show that 11.20% of this dataset is identified as containing hate in different categories. These evaluations show that online hate is manifested in various forms, confirming the complicated and volatile nature of detection in the wild. Index Terms -- Hate speech, machine learning, natural language processing (NLP), online hate, toxicity analysis. INTRODUCTION H E SPREAD of hate speech on online platforms has become a serious problem in our society. As digital communication becomes ubiquitous, platforms like 4chan, known for their anonymity and minimal moderation, have become hotspots for this harmful behaviour . This is particularly evident on its politically incorrect board, /pol/, a notorious board dedicated to discussing politics and current events, often associated with hate speech, extremist content, and conspiracy theories [1] . The anonymity provided by these platforms often encourages users to express extreme ideologies [2] . This issue raises significant concerns about the impact on at - risk and vulnerable groups as it can cause real - world harm, including psychological trauma. Therefore, a systematic approach is needed to measure and understand the prevalence and forms of online hate. Received 28 August 2024; revised 23 December 2024, 10 February 2025, and 6 March 2025; accepted 6 March 2025. This work is supported by the UK Research and Innovation (UKRI), through the Strategic Priority Fund as part of the Protecting Citizens Online programme (AGENCY: Assuring Citizen Agency in a World with Complex Online Harms, EP/W032481/2).


Applications of natural language processing in aviation safety: A review and qualitative analysis

arXiv.org Artificial Intelligence

This study explores using Natural Language Processing in aviation safety, focusing on machine learning algorithms to enhance safety measures. There are currently May 2024, 34 Scopus results from the keyword search natural language processing and aviation safety. Analyzing these studies allows us to uncover trends in the methodologies, findings and implications of NLP in aviation. Both qualitative and quantitative tools have been used to investigate the current state of literature on NLP for aviation safety. The qualitative analysis summarises the research motivations, objectives, and outcomes, showing how NLP can be utilized to help identify critical safety issues and improve aviation safety. This study also identifies research gaps and suggests areas for future exploration, providing practical recommendations for the aviation industry. We discuss challenges in implementing NLP in aviation safety, such as the need for large, annotated datasets, and the difficulty in interpreting complex models. We propose solutions like active learning for data annotation and explainable AI for model interpretation. Case studies demonstrate the successful application of NLP in improving aviation safety, highlighting its potential to make aviation safer and more efficient.


Integrating Natural Language Processing Techniques of Text Mining Into Financial System: Applications and Limitations

arXiv.org Artificial Intelligence

The financial sector, a pivotal force in economic development, increasingly uses the intelligent technologies such as natural language processing to enhance data processing and insight extraction. This research paper through a review process of the time span of 2018-2023 explores the use of text mining as natural language processing techniques in various components of the financial system including asset pricing, corporate finance, derivatives, risk management, and public finance and highlights the need to address the specific problems in the discussion section. We notice that most of the research materials combined probabilistic with vector-space models, and text-data with numerical ones. The most used technique regarding information processing is the information classification technique and the most used algorithms include the long-short term memory and bidirectional encoder models. The research noticed that new specific algorithms are developed and the focus of the financial system is mainly on asset pricing component. The research also proposes a path from engineering perspective for researchers who need to analyze financial text. The challenges regarding text mining perspective such as data quality, context-adaption and model interpretability need to be solved so to integrate advanced natural language processing models and techniques in enhancing financial analysis and prediction. Keywords: Financial System (FS), Natural Language Processing (NLP), Software and Text Engineering, Probabilistic, Vector-Space, Models, Techniques, TextData, Financial Analysis.


Decade of Natural Language Processing in Chronic Pain: A Systematic Review

arXiv.org Artificial Intelligence

In recent years, the intersection of Natural Language Processing (NLP) and public health has opened innovative pathways for investigating various domains, including chronic pain in textual datasets. Despite the promise of NLP in chronic pain, the literature is dispersed across various disciplines, and there is a need to consolidate existing knowledge, identify knowledge gaps in the literature, and inform future research directions in this emerging field. This review aims to investigate the state of the research on NLP-based interventions designed for chronic pain research. A search strategy was formulated and executed across PubMed, Web of Science, IEEE Xplore, Scopus, and ACL Anthology to find studies published in English between 2014 and 2024. After screening 132 papers, 26 studies were included in the final review. Key findings from this review underscore the significant potential of NLP techniques to address pressing challenges in chronic pain research. The past 10 years in this field have showcased the utilization of advanced methods (transformers like RoBERTa and BERT) achieving high-performance metrics (e.g., F1>0.8) in classification tasks, while unsupervised approaches like Latent Dirichlet Allocation (LDA) and k-means clustering have proven effective for exploratory analyses. Results also reveal persistent challenges such as limited dataset diversity, inadequate sample sizes, and insufficient representation of underrepresented populations. Future research studies should explore multimodal data validation systems, context-aware mechanistic modeling, and the development of standardized evaluation metrics to enhance reproducibility and equity in chronic pain research.


Natural Language Processing for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review

arXiv.org Artificial Intelligence

Objective: This review aims to analyze the application of natural language processing (NLP) techniques in cancer research using electronic health records (EHRs) and clinical notes. This review addresses gaps in the existing literature by providing a broader perspective than previous studies focused on specific cancer types or applications. Methods: A comprehensive literature search was conducted using the Scopus database, identifying 94 relevant studies published between 2019 and 2024. Data extraction included study characteristics, cancer types, NLP methodologies, dataset information, performance metrics, challenges, and future directions. Studies were categorized based on cancer types and NLP applications. Results: The results showed a growing trend in NLP applications for cancer research, with breast, lung, and colorectal cancers being the most studied. Information extraction and text classification emerged as predominant NLP tasks. A shift from rule-based to advanced machine learning techniques, particularly transformer-based models, was observed. The Dataset sizes used in existing studies varied widely. Key challenges included the limited generalizability of proposed solutions and the need for improved integration into clinical workflows. Conclusion: NLP techniques show significant potential in analyzing EHRs and clinical notes for cancer research. However, future work should focus on improving model generalizability, enhancing robustness in handling complex clinical language, and expanding applications to understudied cancer types. Integration of NLP tools into clinical practice and addressing ethical considerations remain crucial for utilizing the full potential of NLP in enhancing cancer diagnosis, treatment, and patient outcomes.


Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language

arXiv.org Artificial Intelligence

This paper describes the creation, optimization, and assessment of a question-answering (QA) model for a personalized learning assistant that uses BERT transformers customized for the Arabic language. The model was particularly finetuned on science textbooks in Palestinian curriculum. Our approach uses BERT's brilliant capabilities to automatically produce correct answers to questions in the field of science education. The model's ability to understand and extract pertinent information is improved by finetuning it using 11th and 12th grade biology book in Palestinian curriculum. This increases the model's efficacy in producing enlightening responses. Exact match (EM) and F1 score metrics are used to assess the model's performance; the results show an EM score of 20% and an F1 score of 51%. These findings show that the model can comprehend and react to questions in the context of Palestinian science book. The results demonstrate the potential of BERT-based QA models to support learning and understanding Arabic students questions.


Development of an NLP-driven computer-based test guide for visually impaired students

arXiv.org Artificial Intelligence

In recent years, advancements in Natural Language Processing (NLP) techniques have revolutionized the field of accessibility and exclusivity of testing, particularly for visually impaired students (VIS). CBT has shown in years back its relevance in terms of administering exams electronically, making the test process easier, providing quicker and more accurate results, and offering greater flexibility and accessibility for candidates. Yet, its relevance was not felt by the visually impaired students as they cannot access printed documents. Hence, in this paper, we present an NLP-driven Computer-Based Test guide for visually impaired students. It employs a speech technology pre-trained methods to provide real-time assistance and support to visually impaired students. The system utilizes NLP technologies to convert the text-based questions and the associated options in a machine-readable format. Subsequently, the speech technology pre-trained model processes the converted text enabling the VIS to comprehend and analyze the content. Furthermore, we validated that this pre-trained model is not perverse by testing for accuracy using sample audio datasets labels (A, B, C, D, E, F, G) to compare with the voice recordings obtained from 20 VIS which is been predicted by the system to attain values for precision, recall, and F1-scores. These metrics are used to assess the performance of the pre-trained model and have indicated that it is proficient enough to give its better performance to the evaluated system. The methodology adopted for this system is Object Oriented Analysis and Design Methodology (OOADM) where Objects are discussed and built by modeling real-world instances.


Maintaining Journalistic Integrity in the Digital Age: A Comprehensive NLP Framework for Evaluating Online News Content

arXiv.org Artificial Intelligence

The rapid growth of online news platforms has led to an increased need for reliable methods to evaluate the quality and credibility of news articles. This paper proposes a comprehensive framework to analyze online news texts using natural language processing (NLP) techniques, particularly a language model specifically trained for this purpose, alongside other well-established NLP methods. The framework incorporates ten journalism standards-objectivity, balance and fairness, readability and clarity, sensationalism and clickbait, ethical considerations, public interest and value, source credibility, relevance and timeliness, factual accuracy, and attribution and transparency-to assess the quality of news articles. By establishing these standards, researchers, media organizations, and readers can better evaluate and understand the content they consume and produce. The proposed method has some limitations, such as potential difficulty in detecting subtle biases and the need for continuous updating of the language model to keep pace with evolving language patterns.