AITopics | roc-auc score

Collaborating Authors

roc-auc score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Melanoma Classification Through Deep Ensemble Learning and Explainable AI

Perera, Wadduwage Shanika, Islam, ABM, Pham, Van Vung, An, Min Kyung

arXiv.org Artificial IntelligenceNov-4-2025

The skin is the largest organ in the human body, and approximately a third of the total number of cancer cases are represented by skin cancers. Melanoma is the deadliest form of skin cancer, which is responsible for an overwhelming majority of skin cancer deaths. The number of melanoma deaths is expected to increase by 4.4% in 2023. Although the mortality is significant, when detected e arly, the 5-year survival rate for melanoma is over 99% (American Cancer Society, 2022). Currently, the most accurate way to diagnose melanoma is a biopsy. This is a penetrative surgical procedure that involves higher costs but also incorporates risks of developing various infectious diseases (Lakhtakia et al., 2009). Thus, the usual clinical practice of melanoma diagnosis is visual inspection using Dermoscopy by dermatologists or specially trained clinicians. This approach presents challenges, primarily due to its resource-intensive nature in terms of time and cost. This method's accuracy of melanoma diagnosis is approximately

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.5220/0012575400003657

2511.00246

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

e43739bba7cdb577e9e3e4e42447f5a5-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 00:37:50 GMT

We thank the reviewers for their time and valuable feedback. Below, we clarify a number of important points raised by the reviewers. Reviewers raise concern on multi-modal embeddings. We will highlight this limitation in Sec. R3 suggests that "the authors can adapt the FOL queries to other We argue the differences in tasks and setups below.

betae, differential entropy, query, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)

Add feedback

Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?

Gehring, Lukas, Paaßen, Benjamin

arXiv.org Artificial IntelligenceAug-12-2025

Recent advancements in Large Language Models (LLMs) and their increased accessibility have made it easier than ever for students to automatically generate texts, posing new challenges for educational institutions. To enforce norms of academic integrity and ensure students' learning, learning analytics methods to automatically detect LLM-generated text appear increasingly appealing. This paper benchmarks the performance of different state-of-the-art detectors in educational contexts, introducing a novel dataset, called Generative Essay Detection in Education (GEDE), containing over 900 student-written essays and over 12,500 LLM-generated essays from various domains. To capture the diversity of LLM usage practices in generating text, we propose the concept of contribution levels, representing students' contribution to a given assignment. These levels range from purely human-written texts, to slightly LLM-improved versions, to fully LLM-generated texts, and finally to active attacks on the detector by "humanizing" generated texts. We show that most detectors struggle to accurately classify texts of intermediate student contribution levels, like LLM-improved human-written texts. Detectors are particularly likely to produce false positives, which is problematic in educational settings where false suspicions can severely impact students' lives. Our dataset, code, and additional supplementary materials are publicly available at https://github.com/lukasgehring/Assessing-LLM-Text-Detection-in-Educational-Contexts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.08096

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification

Zhang, Anqi, Chen, Yulin, Pan, Jane, Zhao, Chen, Panda, Aurojit, Li, Jinyang, He, He

arXiv.org Artificial IntelligenceApr-9-2025

Reasoning models have achieved remarkable performance on tasks like math and logical reasoning thanks to their ability to search during reasoning. However, they still suffer from overthinking, often performing unnecessary reasoning steps even after reaching the correct answer. This raises the question: can models evaluate the correctness of their intermediate answers during reasoning? In this work, we study whether reasoning models encode information about answer correctness through probing the model's hidden states. The resulting probe can verify intermediate answers with high accuracy and produces highly calibrated scores. Additionally, we find models' hidden states encode correctness of future answers, enabling early prediction of the correctness before the intermediate answer is fully formulated. We then use the probe as a verifier to decide whether to exit reasoning at intermediate answers during inference, reducing the number of inference tokens by 24\% without compromising performance. These findings confirm that reasoning models do encode a notion of correctness yet fail to exploit it, revealing substantial untapped potential to enhance their efficiency.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.05419

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Reducing False Ventricular Tachycardia Alarms in ICU Settings: A Machine Learning Approach

Farayola, Grace Funmilayo, Akintola, Akinyemi Sadeeq, Fagbohun, Oluwole, Oforgu, Chukwuka Michael, Kayode, Bisola Faith, Chimezie, Christian, Kadri, Temitope, Oludotun, Abiola, Ogbeide, Nelson, Michael, Mgbame, Ifaturoti, Adeseye, Oloyede, Toyese

arXiv.org Artificial IntelligenceMar-18-2025

False arrhythmia alarms in intensive care units (ICUs) are a significant challenge, contributing to alarm fatigue and potentially compromising patient safety. Ventricular tachycardia (VT) alarms are particularly difficult to detect accurately due to their complex nature. This paper presents a machine learning approach to reduce false VT alarms using the VTaC dataset, a benchmark dataset of annotated VT alarms from ICU monitors. We extract time-domain and frequency-domain features from waveform data, preprocess the data, and train deep learning models to classify true and false VT alarms. Our results demonstrate high performance, with ROC-AUC scores exceeding 0.96 across various training configurations. This work highlights the potential of machine learning to improve the accuracy of VT alarm detection in clinical settings.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.14621

Country:

Europe > Portugal > Lisbon > Lisbon (0.15)
Europe > United Kingdom > England > Greater London > London (0.07)
Europe > United Kingdom > England > Bristol (0.05)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data

Akkus, Atilla, Li, Mingjie, Chu, Junjie, Backes, Michael, Zhang, Yang, Sav, Sinem

arXiv.org Artificial IntelligenceSep-12-2024

Large language models (LLMs) have shown considerable success in a range of domain-specific tasks, especially after fine-tuning. However, fine-tuning with real-world data usually leads to privacy risks, particularly when the fine-tuning samples exist in the pre-training data. To avoid the shortcomings of real data, developers often employ methods to automatically generate synthetic data for fine-tuning, as data generated by traditional models are often far away from the real-world pertaining data. However, given the advanced capabilities of LLMs, the distinction between real data and LLM-generated data has become negligible, which may also lead to privacy risks like real data. In this paper, we present an empirical analysis of this underexplored issue by investigating a key question: "Does fine-tuning with LLM-generated data enhance privacy, or does it pose additional privacy risks?" Based on the structure of LLM's generated data, our research focuses on two primary approaches to fine-tuning with generated data: supervised fine-tuning with unstructured generated data and self-instruct tuning. The number of successful Personal Information Identifier (PII) extractions for Pythia after fine-tuning our generated data raised over $20\%$. Furthermore, the ROC-AUC score of membership inference attacks for Pythia-6.9b after self-instruct methods also achieves more than $40\%$ improvements on ROC-AUC score than base models. The results indicate the potential privacy risks in LLMs when fine-tuning with the generated data.

dataset, fine-tuning, privacy risk, (15 more...)

arXiv.org Artificial Intelligence

2409.11423

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > North Carolina > Nash County > Rocky Mount (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigation of unsupervised and supervised hyperspectral anomaly detection

Hossain, Mazharul, Robinson, Aaron, Wang, Lan, Preza, Chrysanthe

arXiv.org Artificial IntelligenceAug-13-2024

Hyperspectral sensing is a valuable tool for detecting anomalies and distinguishing between materials in a scene. Hyperspectral anomaly detection (HS-AD) helps characterize the captured scenes and separates them into anomaly and background classes. It is vital in agriculture, environment, and military applications such as RSTA (reconnaissance, surveillance, and target acquisition) missions. We previously designed an equal voting ensemble of hyperspectral unmixing and three unsupervised HS-AD algorithms. We later utilized a supervised classifier to determine the weights of a voting ensemble, creating a hybrid of heterogeneous unsupervised HS-AD algorithms with a supervised classifier in a model stacking, which improved detection accuracy. However, supervised classification methods usually fail to detect novel or unknown patterns that substantially deviate from those seen previously. In this work, we evaluate our technique and other supervised and unsupervised methods using general hyperspectral data to provide new insights.

anomaly detection, dataset, detection, (12 more...)

arXiv.org Artificial Intelligence

2408.07114

Country:

North America > United States > Arizona (0.07)
North America > United States > California > San Diego County > San Diego (0.06)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
Europe > Spain > Basque Country (0.04)

Genre: Research Report (1.00)

Industry: Government (0.94)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.33)

Add feedback

Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data

Hayes, Nicole, Merkurjev, Ekaterina, Wei, Guo-Wei

arXiv.org Artificial IntelligenceJun-19-2024

Data sets with imbalanced class sizes, often where one class size is much smaller than that of others, occur extremely often in various applications, including those with biological foundations, such as drug discovery and disease diagnosis. Thus, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to detect can result in heavy costs. However, many data classification algorithms do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this paper, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) techniques and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification problems on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed method not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer model based on an attention mechanism for self-supervised learning. Additionally, the method implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed model is validated using six molecular data sets, and we also provide a thorough comparison to other competing algorithms. The computational experiments show that the proposed method performs better than competing techniques even when the class imbalance ratio is very high.

algorithm, roc-auc score, threshold, (15 more...)

arXiv.org Artificial Intelligence

2406.06479

Country: North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Watermarking Generative Tabular Data

He, Hengzhi, Yu, Peiyu, Ren, Junpeng, Wu, Ying Nian, Cheng, Guang

arXiv.org Artificial IntelligenceMay-22-2024

In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based on simple data binning. Specifically, it divides the feature's value range into finely segmented intervals and embeds watermarks into selected ``green list" intervals. To detect the watermarks, we develop a principled statistical hypothesis-testing framework with minimal assumptions: it remains valid as long as the underlying data distribution has a continuous density function. The watermarking efficacy is demonstrated through rigorous theoretical analysis and empirical validation, highlighting its utility in enhancing the security of synthetic and real-world datasets.

green list, tabular data, watermark, (14 more...)

arXiv.org Artificial Intelligence

2405.14018

Country:

North America > United States > California (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank

Menestrel, Thomas Le, Craig, Erin, Tibshirani, Robert, Hastie, Trevor, Rivas, Manuel

arXiv.org Artificial IntelligenceMay-7-2024

Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data. We evaluate the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on data from White British and other ancestries and validated across a cohort of over 96,000 individuals for 8 diseases. Out of 96 models trained, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores (p-value < 0.05), found for diabetes, arthritis, gall stones, cystitis, asthma and osteoarthritis. For the interaction and pretrained models that outperformed the baseline, the PRS score was the primary driver behind prediction. Our findings indicate that both interaction terms and pre-training can enhance prediction accuracy but for a limited set of diseases and moderate improvements in accuracy.

ancestry, dataset, glinternet, (11 more...)

arXiv.org Artificial Intelligence

2404.17626

Country:

Europe > United Kingdom (0.49)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Add feedback