AITopics | severity score

Collaborating Authors

severity score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

Perel, Tom

arXiv.org Artificial IntelligenceNov-25-2025

The recent boom and rapid integration of Large Language Models (LLMs) into a wide range of applications warrants a deeper understanding of their security and safety vulnerabilities. This paper presents a comparative analysis of the susceptibility to jailbreak attacks for two leading publicly available LLMs, Google's Gemini 2.5 Flash and OpenAI's GPT-4 (specifically the GPT-4o mini model accessible in the free tier). The research utilized two main bypass strategies: 'self-bypass', where models were prompted to circumvent their own safety protocols, and 'cross-bypass', where one model generated adversarial prompts to exploit vulnerabilities in the other. Four attack methods were employed - direct injection, role-playing, context manipulation, and obfuscation - to generate five distinct categories of unsafe content: hate speech, illegal activities, malicious code, dangerous content, and misinformation. The success of the attack was determined by the generation of disallowed content, with successful jailbreaks assigned a severity score. The findings indicate a disparity in jailbreak susceptibility between 2.5 Flash and GPT-4, suggesting variations in their safety implementations or architectural design. Cross-bypass attacks were particularly effective, indicating that an ample amount of vulnerabilities exist in the underlying transformer architecture. This research contributes a scalable framework for automated AI red-teaming and provides data-driven insights into the current state of LLM safety, underscoring the complex challenge of balancing model capabilities with robust safety mechanisms.

adversarial vulnerability, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.17666

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Automated Triaging and Transfer Learning of Incident Learning Safety Reports Using Large Language Representational Models

Beidler, Peter, Nguyen, Mark, Lybarger, Kevin, Holmberg, Ola, Ford, Eric, Kang, John

arXiv.org Artificial IntelligenceSep-18-2025

PURPOSE: Incident reports are an important tool for safety and quality improvement in healthcare, but manual review is time-consuming and requires subject matter expertise. Here we present a natural language processing (NLP) screening tool to detect high-severity incident reports in radiation oncology across two institutions. METHODS AND MATERIALS: We used two text datasets to train and evaluate our NLP models: 7,094 reports from our institution (Inst.), and 571 from IAEA SAFRON (SF), all of which had severity scores labeled by clinical content experts. We trained and evaluated two types of models: baseline support vector machines (SVM) and BlueBERT which is a large language model pretrained on PubMed abstracts and hospitalized patient data. We assessed for generalizability of our model in two ways. First, we evaluated models trained using Inst.-train on SF-test. Second, we trained a BlueBERT_TRANSFER model that was first fine-tuned on Inst.-train then on SF-train before testing on SF-test set. To further analyze model performance, we also examined a subset of 59 reports from our Inst. dataset, which were manually edited for clarity. RESULTS Classification performance on the Inst. test achieved AUROC 0.82 using SVM and 0.81 using BlueBERT. Without cross-institution transfer learning, performance on the SF test was limited to an AUROC of 0.42 using SVM and 0.56 using BlueBERT. BlueBERT_TRANSFER, which was fine-tuned on both datasets, improved the performance on SF test to AUROC 0.78. Performance of SVM, and BlueBERT_TRANSFER models on the manually curated Inst. reports (AUROC 0.85 and 0.74) was similar to human performance (AUROC 0.81). CONCLUSION: In summary, we successfully developed cross-institution NLP models on incident report text from radiation oncology centers. These models were able to detect high-severity reports similarly to humans on a curated dataset.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.13706

Country: North America (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Energy > Power Industry > Utilities > Nuclear (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Evaluating LLMs on Chinese Idiom Translation

Yang, Cai, Dou, Yao, Heineman, David, Wu, Xiaofeng, Xu, Wei

arXiv.org Artificial IntelligenceAug-15-2025

Idioms, whose figurative meanings usually differ from their literal interpretations, are common in everyday language, especially in Chinese, where they often contain historical references and follow specific structural patterns. Despite recent progress in machine translation with large language models, little is known about Chinese idiom translation. In this work, we introduce IdiomEval, a framework with a comprehensive error taxonomy for Chinese idiom translation. We annotate 900 translation pairs from nine modern systems, including GPT-4o and Google Translate, across four domains: web, news, Wikipedia, and social media. We find these systems fail at idiom translation, producing incorrect, literal, partial, or even missing translations. The best-performing system, GPT-4, makes errors in 28% of cases. We also find that existing evaluation metrics measure idiom quality poorly with Pearson correlation below 0.48 with human ratings. We thus develop improved models that achieve F$_1$ scores of 0.68 for detecting idiom translation errors.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2508.10421

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.93)

Genre: Research Report > Experimental Study (0.46)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data

Strobl, Eric V.

arXiv.org Machine LearningJul-28-2025

Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia.

artificial intelligence, correlation, machine learning, (18 more...)

arXiv.org Machine Learning

2506.16629

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)

Add feedback

VTS-Guided AI Interaction Workflow for Business Insights

Ding, Sun, Enebeli, Ude, Atilhan, null, Manay, null, Pua, Ryan, Kotak, Kamal

arXiv.org Artificial IntelligenceJul-2-2025

Modern firms face a flood of dense, unstructured reports. Turning these documents into usable insights takes heavy effort and is far from agile when quick answers are needed. VTS-AI tackles this gap. It integrates Visual Thinking Strategies, which emphasize evidence-based observation, linking, and thinking, into AI agents, so the agents can extract business insights from unstructured text, tables, and images at scale. The system works in three tiers (micro, meso, macro). It tags issues, links them to source pages, and rolls them into clear action levers stored in a searchable YAML file. In tests on an 18-page business report, VTS-AI matched the speed of a one-shot ChatGPT prompt yet produced richer findings: page locations, verbatim excerpts, severity scores, and causal links. Analysts can accept or adjust these outputs in the same IDE, keeping human judgment in the loop. Early results show VTS-AI spots the direction of key metrics and flags where deeper number-crunching is needed. Next steps include mapping narrative tags to financial ratios, adding finance-tuned language models through a Model-Context Protocol, and building a Risk & Safety Layer to stress-test models and secure data. These upgrades aim to make VTS-AI a production-ready, audit-friendly tool for rapid business analysis.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2507.00347

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Banking & Finance (1.00)
Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Novel Multi-Task Teacher-Student Architecture with Self-Supervised Pretraining for 48-Hour Vasoactive-Inotropic Trend Analysis in Sepsis Mortality Prediction

Jin, Houji, Ashrafi, Negin, Alaei, Kamiar, Pishgar, Elham, Placencia, Greg, Pishgar, Maryam

arXiv.org Artificial IntelligenceFeb-23-2025

Sepsis is a major cause of ICU mortality, where early recognition and effective interventions are essential for improving patient outcomes. However, the vasoactive-inotropic score (VIS) varies dynamically with a patient's hemodynamic status, complicated by irregular medication patterns, missing data, and confounders, making sepsis prediction challenging. To address this, we propose a novel Teacher-Student multitask framework with self-supervised VIS pretraining via a Masked Autoencoder (MAE). The teacher model performs mortality classification and severity-score regression, while the student distills robust time-series representations, enhancing adaptation to heterogeneous VIS data. Compared to LSTM-based methods, our approach achieves an AUROC of 0.82 on MIMIC-IV 3.0 (9,476 patients), outperforming the baseline (0.74). SHAP analysis revealed that SOFA score (0.147) had the greatest impact on ICU mortality, followed by LODS (0.033), single marital status (0.031), and Medicaid insurance (0.023), highlighting the role of sociodemographic factors. SAPSII (0.020) also contributed significantly. These findings suggest that both clinical and social factors should be considered in ICU decision-making. Our novel multitask and distillation strategies enable earlier identification of high-risk patients, improving prediction accuracy and disease management, offering new tools for ICU decision support.

knowledge distillation, novel multi-task teacher-student architecture, self-supervised pretraining, (10 more...)

arXiv.org Artificial Intelligence

2502.16834

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > Los Angeles County > Pomona (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

Gaze Behavior During a Long-Term, In-Home, Social Robot Intervention for Children with ASD

Ramnauth, Rebecca, Shic, Frederick, Scassellati, Brian

arXiv.org Artificial IntelligenceJan-5-2025

Atypical gaze behavior is a diagnostic hallmark of Autism Spectrum Disorder (ASD), playing a substantial role in the social and communicative challenges that individuals with ASD face. This study explores the impacts of a month-long, in-home intervention designed to promote triadic interactions between a social robot, a child with ASD, and their caregiver. Our results indicate that the intervention successfully promoted appropriate gaze behavior, encouraging children with ASD to follow the robot's gaze, resulting in more frequent and prolonged instances of spontaneous eye contact and joint attention with their caregivers. Additionally, we observed specific timelines for behavioral variability and novelty effects among users. Furthermore, diagnostic measures for ASD emerged as strong predictors of gaze patterns for both caregivers and children. These results deepen our understanding of ASD gaze patterns and highlight the potential for clinical relevance of robot-assisted interventions.

artificial intelligence, intervention, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.02583

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Benchmarking LLMs and SLMs for patient reported outcomes

Marengo, Matteo, Lévy, Jarod, Bibault, Jean-Emmanuel

arXiv.org Artificial IntelligenceDec-20-2024

LLMs have transformed the execution of numerous tasks, including those in the medical domain. Among these, summarizing patient-reported outcomes (PROs) into concise natural language reports is of particular interest to clinicians, as it enables them to focus on critical patient concerns and spend more time in meaningful discussions. While existing work with LLMs like GPT-4 has shown impressive results, real breakthroughs could arise from leveraging SLMs as they offer the advantage of being deployable locally, ensuring patient data privacy and compliance with healthcare regulations. This study benchmarks several SLMs against LLMs for summarizing patient-reported Q\&A forms in the context of radiotherapy. Using various metrics, we evaluate their precision and reliability. The findings highlight both the promise and limitations of SLMs for high-stakes medical tasks, fostering more efficient and privacy-preserving AI-driven healthcare solutions.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.16291

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Explaining GPT-4's Schema of Depression Using Machine Behavior Analysis

Ganesan, Adithya V, Varadarajan, Vasudha, Lal, Yash Kumar, Eijsbroek, Veerle C., Kjell, Katarina, Kjell, Oscar N. E., Dhanasekaran, Tanuja, Stade, Elizabeth C., Eichstaedt, Johannes C., Boyd, Ryan L., Schwartz, H. Andrew, Flek, Lucie

arXiv.org Artificial IntelligenceNov-20-2024

Use of large language models such as ChatGPT (GPT-4) for mental health support has grown rapidly, emerging as a promising route to assess and help people with mood disorders, like depression. However, we have a limited understanding of GPT-4's schema of mental disorders, that is, how it internally associates and interprets symptoms. In this work, we leveraged contemporary measurement theory to decode how GPT-4 interrelates depressive symptoms to inform both clinical utility and theoretical understanding. We found GPT-4's assessment of depression: (a) had high overall convergent validity (r = .71 with self-report on 955 samples, and r = .81 with experts judgments on 209 samples); (b) had moderately high internal consistency (symptom inter-correlates r = .23 to .78 ) that largely aligned with literature and self-report; except that GPT-4 (c) underemphasized suicidality's -- and overemphasized psychomotor's -- relationship with other symptoms, and (d) had symptom inference patterns that suggest nuanced hypotheses (e.g. sleep and fatigue are influenced by most other symptoms while feelings of worthlessness/guilt is mostly influenced by depressed mood).

depression, gpt-4, symptom, (13 more...)

arXiv.org Artificial Intelligence

2411.138

Country:

Europe > Sweden (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
(13 more...)

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

Nanini, Santino, Abid, Mariem, Mamouni, Yassir, Wiedemann, Arnaud, Jouvet, Philippe, Bourassa, Stephane

arXiv.org Artificial IntelligenceOct-30-2024

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

admission, interpolation, sequential model, (13 more...)

arXiv.org Artificial Intelligence

2410.23503

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > Iowa (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback