AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

FraPPE: Fast and Efficient Preference-based Pure Exploration

Das, Udvas, Shukla, Apurv, Basu, Debabrota

arXiv.org Machine LearningAug-25-2025

Preference-based Pure Exploration (PrePEx) aims to identify with a given confidence level the set of Pareto optimal arms in a vector-valued (aka multi-objective) bandit, where the reward vectors are ordered via a (given) preference cone $\mathcal{C}$. Though PrePEx and its variants are well-studied, there does not exist a computationally efficient algorithm that can optimally track the existing lower bound for arbitrary preference cones. We successfully fill this gap by efficiently solving the minimisation and maximisation problems in the lower bound. First, we derive three structural properties of the lower bound that yield a computationally tractable reduction of the minimisation problem. Then, we deploy a Frank-Wolfe optimiser to accelerate the maximisation problem in the lower bound. Together, these techniques solve the maxmin optimisation problem in $\mathcal{O}(KL^{2})$ time for a bandit instance with $K$ arms and $L$ dimensional reward, which is a significant acceleration over the literature. We further prove that our proposed PrePEx algorithm, FraPPE, asymptotically achieves the optimal sample complexity. Finally, we perform numerical experiments across synthetic and real datasets demonstrating that FraPPE achieves the lowest sample complexities to identify the exact Pareto set among the existing algorithms.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2508.16487

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback

Terrain Classification for the Spot Quadrupedal Mobile Robot Using Only Proprioceptive Sensing

Villemure, Sophie, Silveira, Jefferson, Marshall, Joshua A.

arXiv.org Artificial IntelligenceAug-25-2025

--Quadrupedal mobile robots can traverse a wider range of terrain types than their wheeled counterparts but do not perform the same on all terrain types. These robots are prone to undesirable behaviours like sinking and slipping on challenging terrains. T o combat this issue, we propose a terrain classifier that provides information on terrain type that can be used in robotic systems to create a traversability map to plan safer paths for the robot to navigate. The work presented here is a terrain classifier developed for a Boston Dynamics Spot robot. Spot provides over 100 measured proprioceptive signals describing the motions of the robot and its four legs (e.g., foot penetration, forces, joint angles, etc.). The developed terrain classifier combines dimensionality reduction techniques to extract relevant information from the signals and then applies a classification technique to differentiate terrain based on traversability. Quadrupedal mobile robots can traverse a wider range of terrain types than their wheeled counterparts.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CCECE59415.2024.10667168

2508.16504

Country: North America > Canada (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Applications and Challenges of Fairness APIs in Machine Learning Software

Das, Ajoy, Uddin, Gias, Chowdhury, Shaiful, Akhond, Mostafijur Rahman, Hemmati, Hadi

arXiv.org Artificial IntelligenceAug-25-2025

Machine Learning software systems are frequently used in our day-to-day lives. Some of these systems are used in various sensitive environments to make life-changing decisions. Therefore, it is crucial to ensure that these AI/ML systems do not make any discriminatory decisions for any specific groups or populations. In that vein, different bias detection and mitigation open-source software libraries (aka API libraries) are being developed and used. In this paper, we conduct a qualitative study to understand in what scenarios these open-source fairness APIs are used in the wild, how they are used, and what challenges the developers of these APIs face while developing and adopting these libraries. We have analyzed 204 GitHub repositories (from a list of 1885 candidate repositories) which used 13 APIs that are developed to address bias in ML software. We found that these APIs are used for two primary purposes (i.e., learning and solving real-world problems), targeting 17 unique use-cases. Our study suggests that developers are not well-versed in bias detection and mitigation; they face lots of troubleshooting issues, and frequently ask for opinions and resources. Our findings can be instrumental for future bias-related software engineering research, and for guiding educators in developing more state-of-the-art curricula.

artificial intelligence, machine learning, repository, (18 more...)

arXiv.org Artificial Intelligence

2508.16377

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.91)
Education > Educational Setting (0.65)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.67)

Add feedback

Cyber Physical Awareness via Intent-Driven Threat Assessment: Enhanced Space Networks with Intershell Links

Cetin, Selen Gecgel, Ovatman, Tolga, Kurt, Gunes Karabulut

arXiv.org Artificial IntelligenceAug-25-2025

--This letter addresses essential aspects of threat assessment by proposing intent-driven threat models that incorporate both capabilities and intents. We propose a holistic framework for cyber physical awareness (CPA) in space networks, pointing out that analyzing reliability and security separately can lead to overfitting on system-specific criteria. We structure our proposed framework in three main steps. First, we suggest an algorithm that extracts characteristic properties of the received signal to facilitate an intuitive understanding of potential threats. Second, we develop a multitask learning architecture where one task evaluates reliability-related capabilities while the other deciphers the underlying intentions of the signal. Finally, we propose an adaptable threat assessment that aligns with varying security and reliability requirements. The proposed framework enhances the robustness of threat detection and assessment, outperforming conventional sequential methods, and enables space networks with emerging intershell links to effectively address complex threat scenarios.

artificial intelligence, machine learning, threat, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LWC.2025.3593066

2508.16314

Country:

North America > Canada (0.15)
Asia > Middle East > Republic of Türkiye (0.15)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Dac-Fake: A Divide and Conquer Framework for Detecting Fake News on Social Media

Jain, Mayank Kumar, Gopalani, Dinesh, Meena, Yogesh Kumar, Jain, Nishant

arXiv.org Artificial IntelligenceAug-25-2025

With the rapid evolution of technology and the Internet, the proliferation of fake news on social media has become a critical issue, leading to widespread misinformation that can cause societal harm. Traditional fact checking methods are often too slow to prevent the dissemination of false information. Therefore, the need for rapid, automated detection of fake news is paramount. We introduce DaCFake, a novel fake news detection model using a divide and conquer strategy that combines content and context based features. Our approach extracts over eighty linguistic features from news articles and integrates them with either a continuous bag of words or a skipgram model for enhanced detection accuracy. We evaluated the performance of DaCFake on three datasets including Kaggle, McIntire + PolitiFact, and Reuter achieving impressive accuracy rates of 97.88%, 96.05%, and 97.32%, respectively. Additionally, we employed a ten-fold cross validation to further enhance the model's robustness and accuracy. These results highlight the effectiveness of DaCFake in early detection of fake news, offering a promising solution to curb misinformation on social media platforms.

evolutionary algorithm, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.16223

Country: Asia > India (0.68)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.66)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Generative Foundation Model for Structured and Unstructured Electronic Health Records

Sivarajkumar, Sonish, Zhang, Hang, Ji, Yuelyu, Bilalpur, Maneesh, Wu, Xizhi, Li, Chenyu, Kwak, Min Gu, Visweswaran, Shyam, Wang, Yanshan

arXiv.org Artificial IntelligenceAug-25-2025

Electronic health records (EHRs) are rich clinical data sources but complex repositories of patient data, spanning structured elements (demographics, vitals, lab results, codes), unstructured clinical notes and other modalities of data. Harnessing this heterogeneity is critical for improving patient outcomes. Recent advances in large language models (LLMs) have enabled foundation models that can learn from multiple data modalities and support clinical tasks. However, most current approaches simply serialize numeric EHR data into text, which risks losing temporal and quantitative detail. We introduce Generative Deep Patient (GDP), a multimodal foundation model that natively encodes structured EHR time-series via a CNN-Transformer encoder and fuses it with unstructured EHRs through cross-modal attention into a LLaMA-based decoder. GDP is trained in two stages: (1) generative pretraining, where it learns to produce clinical narratives from raw patient timelines while also performing masked feature prediction (MFP) and next time-step prediction (NTP) to capture temporal dynamics; and (2) multi-task fine-tuning for clinically meaningful predictions (e.g., heart failure, type 2 diabetes, 30-day readmission). In clinical prediction, GDP demonstrated superior performance on MIMIC-IV: heart failure AUROC = 0.923, type 2 diabetes AUROC = 0.817, and 30-day readmission AUROC = 0.627. For narrative generation, GDP achieved ROUGE-L = 0.135 and BERTScore-F1 = 0.545. In a blinded human evaluation, GDP-Instruct scored highest on faithfulness, fluency, and overall clinical utility, suggesting reduced hospital documentation workload without sacrificing accuracy. Our results demonstrate that a single multimodal foundation model can both predict clinically actionable events and generate high-quality clinical narratives. Furthermore, GDP's flexible architecture can be extended to additional modalities.

gdp, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2508.16054

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Cross-Attention Multimodal Fusion for Breast Cancer Diagnosis: Integrating Mammography and Clinical Data with Explainability

Nantogmah, Muhaisin Tiyumba, Alhassan, Abdul-Barik, Alhassan, Salamudeen

arXiv.org Artificial IntelligenceAug-25-2025

A precise assessment of the risk of breast lesions can greatly lower it and assist physicians in choosing the best course of action. To categorise breast lesions, the majority of current computer-aided systems only use characteristics from mammograms. Although this method is practical, it does not completely utilise clinical reports' valuable information to attain the best results. When compared to utilising mammography alone, will clinical features greatly enhance the categorisation of breast lesions? How may clinical features and mammograms be combined most effectively? In what ways may explainable AI approaches improve the interpretability and reliability of models used to diagnose breast cancer? To answer these basic problems, a comprehensive investigation is desperately needed. In order to integrate mammography and categorical clinical characteristics, this study examines a number of multimodal deep networks grounded on feature concatenation, co-attention, and cross-attention. The model achieved an AUC-ROC of 0.98, accuracy of 0.96, F1-score of 0.94, precision of 0.92, and recall of 0.95 when tested on publicly accessible datasets (TCGA and CBIS-DDSM).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.16

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
(2 more...)

Add feedback

Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset

Cao-Xue, Jerry, Comlekoglu, Tien, Xue, Keyi, Wang, Guanliang, Li, Jiang, Laurie, Gordon

arXiv.org Artificial IntelligenceAug-25-2025

The development of multi-label deep learning models for retinal disease classification is often hindered by the scarcity of large, expertly annotated clinical datasets due to patient privacy concerns and high costs. The recent release of SynFundus-1M, a high-fidelity synthetic dataset with over one million fundus images, presents a novel opportunity to overcome these barriers. To establish a foundational performance benchmark for this new resource, we developed an end-to-end deep learning pipeline, training six modern architectures (ConvNeXtV2, SwinV2, ViT, ResNet, EfficientNetV2, and the RETFound foundation model) to classify eleven retinal diseases using a 5-fold multi-label stratified cross-validation strategy. We further developed a meta-ensemble model by stacking the out-of-fold predictions with an XGBoost classifier. Our final ensemble model achieved the highest performance on the internal validation set, with a macro-average Area Under the Receiver Operating Characteristic Curve (AUC) of 0.9973. Critically, the models demonstrated strong generalization to three diverse, real-world clinical datasets, achieving an AUC of 0.7972 on a combined DR dataset, an AUC of 0.9126 on the AIROGS glaucoma dataset and a macro-AUC of 0.8800 on the multi-label RFMiD dataset. This work provides a robust baseline for future research on large-scale synthetic datasets and establishes that models trained exclusively on synthetic data can accurately classify multiple pathologies and generalize effectively to real clinical images, offering a viable pathway to accelerate the development of comprehensive AI systems in ophthalmology.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.15986

Country: North America > United States > Virginia > Albemarle County > Charlottesville (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linkage Attacks Expose Identity Risks in Public ECG Data Sharing

Wang, Ziyu, Khatibi, Elahe, Firouzi, Farshad, Mousavi, Sanaz Rahimi, Chakrabarty, Krishnendu, Rahmani, Amir M.

arXiv.org Artificial IntelligenceAug-25-2025

-- The increasing availability of publicly shared electrocardiogram (ECG) data raises critical privacy concerns, as its biometric properties make individuals vulnerable to linkage attacks. Unlike prior studies that assume idealized adversarial capabilities, we evaluate ECG privacy risks under realistic conditions where attackers operate with partial knowledge. Using data from 109 participants across diverse real-world datasets, our approach achieves 85% accuracy in re-identifying individuals in public datasets while maintaining a 14.2% overall misclassification rate at an optimal confidence threshold, with 15.6% of unknown individuals misclassified as known and 12.8% of known individuals misclassified as unknown. These results highlight the inadequacy of simple anonymization techniques in preventing re-identification, demonstrating that even limited adversarial knowledge enables effective identity linkage. Our findings underscore the urgent need for privacy-preserving strategies, such as differential privacy, access control, and encrypted computation, to mitigate re-identification risks while ensuring the utility of shared biosignal data in healthcare applications. Electrocardiograms (ECG) capture the heart's electrical activity, serving as a key diagnostic tool for conditions like heart failure and arrhythmias [1], [2].

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.1585

Country: North America > United States (0.94)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Lexical Hints of Accuracy in LLM Reasoning Chains

Vanhoyweghen, Arne, Verbeken, Brecht, Algaba, Andres, Ginis, Vincent

arXiv.org Artificial IntelligenceAug-25-2025

Fine-tuning Large Language Models (LLMs) with reinforcement learning to produce an explicit Chain-of-Thought (CoT) before answering produces models that consistently raise overall performance on code, math, and general-knowledge benchmarks. However, on benchmarks where LLMs currently achieve low accuracy, such as Humanity's Last Exam (HLE), they often report high self-confidence, reflecting poor calibration. Here, we test whether measurable properties of the CoT provide reliable signals of an LLM's internal confidence in its answers. We analyze three feature classes: (i) CoT length, (ii) intra-CoT sentiment volatility, and (iii) lexicographic hints, including hedging words. Using DeepSeek-R1 and Claude 3.7 Sonnet on both Humanity's Last Exam (HLE), a frontier benchmark with very low accuracy, and Omni-MATH, a saturated benchmark of moderate difficulty, we find that lexical markers of uncertainty (e.g., $\textit{guess}$, $\textit{stuck}$, $\textit{hard}$) in the CoT are the strongest indicators of an incorrect response, while shifts in the CoT sentiment provide a weaker but complementary signal. CoT length is informative only on Omni-MATH, where accuracy is already high ($\approx 70\%$), and carries no signal on the harder HLE ($\approx 9\%$), indicating that CoT length predicts correctness only in the intermediate-difficulty benchmarks, i.e., inside the model's demonstrated capability, but still below saturation. Finally, we find that uncertainty indicators in the CoT are consistently more salient than high-confidence markers, making errors easier to predict than correct responses. Our findings support a lightweight post-hoc calibration signal that complements unreliable self-reported probabilities and supports safer deployment of LLMs.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.15842

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback