AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models

arXiv.org Machine LearningDec-9-2025

Semi-supervised learning (SSL) constructs classifiers from datasets in which only a subset of observations is labelled, a situation that naturally arises because obtaining labels often requires expert judgement or costly manual effort. This motivates methods that integrate labelled and unlabelled data within a learning framework. Most SSL approaches assume that label absence is harmless, typically treated as missing completely at random or ignored, but in practice, the missingness process can be informative, as the chances of an observation being unlabelled may depend on the ambiguity of its feature vector. In such cases, the missingness indicators themselves provide additional information that, if properly modelled, may improve estimation efficiency. The \textbf{SSLfmm} package for R is designed to capture this behaviour by estimating the Bayes' classifier under a finite mixture model in which each component corresponding to a class follows a multivariate normal distribution. It incorporates a mixed-missingness mechanism that combines a missing completely at random (MCAR) component with a (non-ignorable) missing at random (MAR) component, the latter modelling the probability of label missingness as a logistic function of the entropy based on the features. Parameters are estimated via an Expectation--Conditional Maximisation algorithm. In the two-class Gaussian setting with arbitrary covariance matrices, the resulting classifier trained on partially labelled data may, in some cases, achieve a lower misclassification rate than the supervised version in the case where all the labels are known. The package includes a practical tool for modelling and illustrates its performance through simulated examples.

covariance matrix, error rate, mixed-missingness mechanism, (14 more...)

arXiv.org Machine Learning

2512.03322

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Performance of the SafeTerm AI-Based MedDRA Query System Against Standardised MedDRA Queries

Vandenhende, Francois, Georgiou, Anna, Georgiou, Michalis, Psaras, Theodoros, Karekla, Ellie, Hadjicosta, Elena

arXiv.org Artificial IntelligenceDec-9-2025

In pre-market drug safety review, grouping related adverse event terms into SMQs or OCMQs is critical for signal detection. We assess the performance of SafeTerm Automated Medical Query (AMQ) on MedDRA SMQs. The AMQ is a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score (0-1) using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity, and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against tier-1 SMQs (110 queries, v28.1). Precision, recall and F1 were computed at multiple similarity-thresholds, defined either manually or using an automated method. High recall (94%)) is achieved at moderate similarity thresholds, indicative of good retrieval sensitivity. Higher thresholds filter out more terms, resulting in improved precision (up to 89%). The optimal threshold (0.70)) yielded an overall recall of (48%) and precision of (45%) across all 110 queries. Restricting to narrow-term PTs achieved slightly better performance at an increased (+0.05) similarity threshold, confirming increased relatedness of narrow versus broad terms. The automatic threshold (0.66) selection prioritizes recall (0.58) to precision (0.29). SafeTerm AMQ achieves comparable, satisfactory performance on SMQs and sanitized OCMQs. It is therefore a viable supplementary method for automated MedDRA query generation, balancing recall and precision. We recommend using suitable MedDRA PT terminology in query formulation and applying the automated threshold method to optimise recall. Increasing similarity scores allows refined, narrow terms selection.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.07552

Country:

North America > United States (0.15)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)

Add feedback

DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection

Zaman, Sayeem Been, Karim, Wasimul, Abian, Arefin Ittesafun, Mohamed, Reem E., Islam, Md Rafiqul, Karim, Asif, Azam, Sami

arXiv.org Artificial IntelligenceDec-9-2025

The increasing use of synthetic media, particularly deepfakes, is an emerging challenge for digital content verification. Although recent studies use both audio and visual information, most integrate these cues within a single model, which remains vulnerable to modality mismatches, noise, and manipulation. To address this gap, we propose DeepAgent, an advanced multi-agent collaboration framework that simultaneously incorporates both visual and audio modalities for the effective detection of deepfakes. DeepAgent consists of two complementary agents. Agent-1 examines each video with a streamlined AlexNet-based CNN to identify the symbols of deepfake manipulation, while Agent-2 detects audio-visual inconsistencies by combining acoustic features, audio transcriptions from Whisper, and frame-reading sequences of images through EasyOCR. Their decisions are fused through a Random Forest meta-classifier that improves final performance by taking advantage of the different decision boundaries learned by each agent. This study evaluates the proposed framework using three benchmark datasets to demonstrate both component-level and fused performance. Agent-1 achieves a test accuracy of 94.35% on the combined Celeb-DF and FakeAVCeleb datasets. On the FakeAVCeleb dataset, Agent-2 and the final meta-classifier attain accuracies of 93.69% and 81.56%, respectively. In addition, cross-dataset validation on DeepFakeTIMIT confirms the robustness of the meta-classifier, which achieves a final accuracy of 97.49%, and indicates a strong capability across diverse datasets. These findings confirm that hierarchy-based fusion enhances robustness by mitigating the weaknesses of individual modalities and demonstrate the effectiveness of a multi-agent approach in addressing diverse types of manipulations in deepfakes.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.07351

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IFFair: Influence Function-driven Sample Reweighting for Fair Classification

Yang, Jingran, Zhang, Min, Zhang, Lingfeng, Wang, Zhaohui, Zhang, Yonggang

arXiv.org Artificial IntelligenceDec-9-2025

Because machine learning has significantly improved efficiency and convenience in the society, it's increasingly us ed to assist or replace human decision-making. However, the data-based pa ttern makes related algorithms learn and even exacerbate potential bia s in samples, resulting in discriminatory decisions against certain unp rivileged groups, depriving them of the rights to equal treatment, thus damagi ng the social well-being and hindering the development of related applic ations. Therefore, we propose a pre-processing method IFFair based on the influence function. Compared with other fairness optimization appro aches, IFFair only uses the influence disparity of training samples on diffe rent groups as a guidance to dynamically adjust the sample weights durin g training without modifying the network structure, data features and decision boundaries. To evaluate the validity of IFFair, we conduct e xperiments on multiple real-world datasets and metrics. The experimenta l results show that our approach mitigates bias of multiple accepted metri cs in the classification setting, including demographic parity, equaliz ed odds, equality of opportunity and error rate parity without conflicts. It al so demonstrates that IFFair achieves better trade-off between multi ple utility and fairness metrics compared with previous pre-processing me thods.

artificial intelligence, iffair, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2512.07249

Country: Asia (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection

Wang, Mengqi, Wang, Jianwei, Liu, Qing, Xu, Xiwei, Xing, Zhenchang, Zhu, Liming, Zhang, Wenjie

arXiv.org Artificial IntelligenceDec-9-2025

Error detection (ED), which aims to identify incorrect or inconsistent cell values in tabular data, is important for ensuring data quality. Recent state-of-the-art ED methods leverage the pre-trained knowledge and semantic capability embedded in large language models (LLMs) to directly label whether a cell is erroneous. However, this LLM-as-a-labeler pipeline (1) relies on the black box, implicit decision process, thus failing to provide explainability for the detection results, and (2) is highly sensitive to prompts, yielding inconsistent outputs due to inherent model stochasticity, therefore lacking robustness. To address these limitations, we propose an LLM-as-an-inducer framework that adopts LLM to induce the decision tree for ED (termed TreeED) and further ensembles multiple such trees for consensus detection (termed ForestED), thereby improving explainability and robustness. Specifically, based on prompts derived from data context, decision tree specifications and output requirements, TreeED queries the LLM to induce the decision tree skeleton, whose root-to-leaf decision paths specify the stepwise procedure for evaluating a given sample. Each tree contains three types of nodes: (1) rule nodes that perform simple validation checks (e.g., format or range), (2) Graph Neural Network (GNN) nodes that capture complex patterns (e.g., functional dependencies), and (3) leaf nodes that output the final decision types (error or clean). Furthermore, ForestED employs uncertainty-based sampling to obtain multiple row subsets, constructing a decision tree for each subset using TreeED. It then leverages an Expectation-Maximization-based algorithm that jointly estimates tree reliability and optimizes the consensus ED prediction. Extensive xperiments demonstrate that our methods are accurate, explainable and robust, achieving an average F1-score improvement of 16.1% over the best baseline.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.07246

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

PICKT: Practical Interlinked Concept Knowledge Tracing for Personalized Learning using Knowledge Map Concept Relations

Lee, Wonbeen, Lee, Channyoung, Sohn, Junho, Cho, Hansam

arXiv.org Artificial IntelligenceDec-9-2025

With the recent surge in personalized learning, Intelligent Tutoring Systems (ITS) that can accurately track students' individual knowledge states and provide tailored learning paths based on this information are in demand as an essential task. This paper focuses on the core technology of Knowledge Tracing (KT) models that analyze students' sequences of interactions to predict their knowledge acquisition levels. However, existing KT models suffer from limitations such as restricted input data formats, cold start problems arising with new student enrollment or new question addition, and insufficient stability in real-world service environments. To overcome these limitations, a Practical Interlinked Concept Knowledge Tracing (PICKT) model that can effectively process multiple types of input data is proposed. Specifically, a knowledge map structures the relationships among concepts considering the question and concept text information, thereby enabling effective knowledge tracing even in cold start situations. Experiments reflecting real operational environments demonstrated the model's excellent performance and practicality. The main contributions of this research are as follows. First, a model architecture that effectively utilizes diverse data formats is presented. Second, significant performance improvements are achieved over existing models for two core cold start challenges: new student enrollment and new question addition. Third, the model's stability and practicality are validated through delicate experimental design, enhancing its applicability in real-world product environments. This provides a crucial theoretical and technical foundation for the practical implementation of next-generation ITS.

artificial intelligence, expert system, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.07179

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Using Vision-Language Models as Proxies for Social Intelligence in Human-Robot Interaction

Bu, Fanjun, Tsai, Melina, Tjokro, Audrey, Bhattacharjee, Tapomayukh, Ortiz, Jorge, Ju, Wendy

arXiv.org Artificial IntelligenceDec-9-2025

Robots operating in everyday environments must often decide when and whether to engage with people, yet such decisions often hinge on subtle nonverbal cues that unfold over time and are difficult to model explicitly. Drawing on a five-day Wizard-of-Oz deployment of a mobile service robot in a university cafe, we analyze how people signal interaction readiness through nonverbal behaviors and how expert wizards use these cues to guide engagement. Motivated by these observations, we propose a two-stage pipeline in which lightweight perceptual detectors (gaze shifts and proxemics) are used to selectively trigger heavier video-based vision-language model (VLM) queries at socially meaningful moments. We evaluate this pipeline on replayed field interactions and compare two prompting strategies. Our findings suggest that selectively using VLMs as proxies for social reasoning enables socially responsive robot behavior, allowing robots to act appropriately by attending to the cues people naturally provide in real-world interactions.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.07177

Country: North America > United States (0.47)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.47)
(2 more...)

Add feedback

Transformation of Biological Networks into Images via Semantic Cartography for Visual Interpretation and Scalable Deep Analysis

Mostafa, Sakib, Xing, Lei, Islam, Md. Tauhidul

arXiv.org Artificial IntelligenceDec-9-2025

Complex biological networks are fundamental to biomedical science, capturing interactions among molecules, cells, genes, and tissues. Deciphering these networks is critical for understanding health and disease, yet their scale and complexity represent a daunting challenge for current computational methods. Traditional biological network analysis methods, including deep learning approaches, while powerful, face inherent challenges such as limited scalability, oversmoothing long-range dependencies, difficulty in multimodal integration, expressivity bounds, and poor interpretability. We present Graph2Image, a framework that transforms large biological networks into sets of two-dimensional images by spatially arranging representative network nodes on a 2D grid. This transformation decouples the nodes as images, enabling the use of convolutional neural networks (CNNs) with global receptive fields and multi-scale pyramids, thus overcoming limitations of existing biological network analysis methods in scalability, memory efficiency, and long-range context capture. Graph2Image also facilitates seamless integration with other imaging and omics modalities and enhances interpretability through direct visualization of node-associated images. When applied to several large-scale biological network datasets, Graph2Image improved classification accuracy by up to 67.2% over existing methods and provided interpretable visualizations that revealed biologically coherent patterns. It also allows analysis of very large biological networks (nodes > 1 billion) on a personal computer. Graph2Image thus provides a scalable, interpretable, and multimodal-ready approach for biological network analysis, offering new opportunities for disease diagnosis and the study of complex biological systems.

artificial intelligence, graph2image, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.0704

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comprehensive Study of Supervised Machine Learning Models for Zero-Day Attack Detection: Analyzing Performance on Imbalanced Data

Lotfi, Zahra, Lotfi, Mostafa

arXiv.org Artificial IntelligenceDec-9-2025

Among the various types of cyberattacks, identifying zero-day attacks is problematic because they are unknown to security systems as their pattern and characteristics do not match known blacklisted attacks. There are many Machine Learning (ML) models designed to analyze and detect network attacks, especially using supervised models. However, these models are designed to classify samples (normal and attacks) based on the patterns they learn during the training phase, so they perform inefficiently on unseen attacks. This research addresses this issue by evaluating five different supervised models to assess their performance and execution time in predicting zero-day attacks and find out which model performs accurately and quickly. The goal is to improve the performance of these supervised models by not only proposing a framework that applies grid search, dimensionality reduction and oversampling methods to overcome the imbalance problem, but also comparing the effectiveness of oversampling on ml model metrics, in particular the accuracy. To emulate attack detection in real life, this research applies a highly imbalanced data set and only exposes the classifiers to zero-day attacks during the testing phase, so the models are not trained to flag the zero-day attacks. Our results show that Random Forest (RF) performs best under both oversampling and non-oversampling conditions, this increased effectiveness comes at the cost of longer processing times. Therefore, we selected XG Boost (XGB) as the top model due to its fast and highly accurate performance in detecting zero-day attacks.

artificial intelligence, machine learning, zero-day attack, (16 more...)

arXiv.org Artificial Intelligence

2512.0703

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.49)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients

Arun, Krishna, Bhattachrya, Moinak, Goel, Paras

arXiv.org Artificial IntelligenceDec-9-2025

Currently, there is a noticeable lack of AI in the medical field to support doctors in treating heterogenous brain tumors such as Glioblastoma Multiforme (GBM), the deadliest human cancer in the world with a five-year survival rate of just 5.1%. This project develops an AI system offering the only end-to-end solution by aiding doctors with both diagnosis and treatment planning. In the diagnosis phase, a sequential decision-making framework consisting of 4 classification models (Convolutional Neural Networks and Support Vector Machine) are used. Each model progressively classifies the patient's brain into increasingly specific categories, with the final step being named diagnosis. For treatment planning, an RL system consisting of 3 generative models is used. First, the resection model (diffusion model) analyzes the diagnosed GBM MRI and predicts a possible resection outcome. Second, the radiotherapy model (Spatio-Temporal Vision Transformer) generates an MRI of the brain's progression after a user-defined number of weeks. Third, the chemotherapy model (Diffusion Model) produces the post-treatment MRI. A survival rate calculator (Convolutional Neural Network) then checks if the generated post treatment MRI has a survival rate within 15% of the user defined target. If not, a feedback loop using proximal policy optimization iterates over this system until an optimal resection location is identified. When compared to existing solutions, this project found 3 key findings: (1) Using a sequential decision-making framework consisting of 4 small diagnostic models reduced computing costs by 22.28x, (2) Transformers regression capabilities decreased tumor progression inference time by 113 hours, and (3) Applying Augmentations resembling Real-life situations improved overall DICE scores by 2.9%. These results project to increase survival rates by 0.9%, potentially saving approximately 2,250 lives.

machine learning, reinforcement learning, utilizing marl system, (12 more...)

arXiv.org Artificial Intelligence

2512.0699

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Childhood Cancer (0.61)
Health & Medicine > Therapeutic Area > Oncology > Brain Cancer (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback