AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Balancing Interpretability and Performance in Reinforcement Learning: An Adaptive Spectral Based Linear Approach

Yi, Qianxin, Lin, Shao-Bo, Fan, Jun, Wang, Yao

arXiv.org Machine LearningOct-7-2025

Reinforcement learning (RL) has been widely applied to sequential decision making, where interpretability and performance are both critical for practical adoption. Current approaches typically focus on performance and rely on post hoc explanations to account for interpretability. Different from these approaches, we focus on designing an interpretability-oriented yet performance-enhanced RL approach. Specifically, we propose a spectral based linear RL method that extends the ridge regression-based approach through a spectral filter function. The proposed method clarifies the role of regularization in controlling estimation error and further enables the design of an adaptive regularization parameter selection strategy guided by the bias-variance trade-off principle. Theoretical analysis establishes near-optimal bounds for both parameter estimation and generalization error. Extensive experiments on simulated environments and real-world datasets from Kuaishou and Taobao demonstrate that our method either outperforms or matches existing baselines in decision quality. We also conduct interpretability analyses to illustrate how the learned policies make decisions, thereby enhancing user trust. These results highlight the potential of our approach to bridge the gap between RL theory and practical decision making, providing interpretability, accuracy, and adaptability in management contexts.

adaptive spectral, balancing interpretability and performance, reinforcement learning, (9 more...)

arXiv.org Machine Learning

2510.03722

Country:

North America > United States (0.27)
Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

From Moments to Models: Graphon Mixture-Aware Mixup and Contrastive Learning

Azizpour, Ali, Ramezanpour, Reza, Sabharwal, Ashutosh, Segarra, Santiago

arXiv.org Machine LearningOct-7-2025

Real-world graph datasets often consist of mixtures of populations, where graphs are generated from multiple distinct underlying distributions. However, modern representation learning approaches, such as graph contrastive learning (GCL) and augmentation methods like Mixup, typically overlook this mixture structure. In this work, we propose a unified framework that explicitly models data as a mixture of underlying probabilistic graph generative models represented by graphons. To characterize these graphons, we leverage graph moments (motif densities) to cluster graphs arising from the same model. This enables us to disentangle the mixture components and identify their distinct generative mechanisms. This model-aware partitioning benefits two key graph learning tasks: 1) It enables a graphon-mixture-aware mixup (GMAM), a data augmentation technique that interpolates in a semantically valid space guided by the estimated graphons, instead of assuming a single graphon per class. Additionally, by introducing a new model-aware objective, our proposed approach (termed MGCL) improves negative sampling by restricting negatives to graphs from other models. We establish a key theoretical guarantee: a novel, tighter bound showing that graphs sampled from graphons with small cut distance will have similar motif densities with high probability. Extensive experiments on benchmark datasets demonstrate strong empirical performance. In unsupervised learning, MGCL achieves state-of-the-art results, obtaining the top average rank across eight datasets. In supervised learning, GMAM consistently outperforms existing strategies, achieving new state-of-the-art accuracy in 6 out of 7 datasets.

dataset, graph, graphon, (15 more...)

arXiv.org Machine Learning

2510.0369

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Longitudinal Flow Matching for Trajectory Modeling

Islam, Mohammad Mohaiminul, Kuipers, Thijs P., Vadgama, Sharvaree, de Vente, Coen, Khan, Afsana, Sánchez, Clara I., Bekkers, Erik J.

arXiv.org Machine LearningOct-7-2025

Generative models for sequential data often struggle with sparsely sampled and high-dimensional trajectories, typically reducing the learning of dynamics to pairwise transitions. We propose \textit{Interpolative Multi-Marginal Flow Matching} (IMMFM), a framework that learns continuous stochastic dynamics jointly consistent with multiple observed time points. IMMFM employs a piecewise-quadratic interpolation path as a smooth target for flow matching and jointly optimizes drift and a data-driven diffusion coefficient, supported by a theoretical condition for stable learning. This design captures intrinsic stochasticity, handles irregular sparse sampling, and yields subject-specific trajectories. Experiments on synthetic benchmarks and real-world longitudinal neuroimaging datasets show that IMMFM outperforms existing methods in both forecasting accuracy and further downstream tasks.

dataset, longitudinal flow matching, trajectory, (10 more...)

arXiv.org Machine Learning

2510.03569

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.88)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Add feedback

Latent Uncertainty Representations for Video-based Driver Action and Intention Recognition

Vellenga, Koen, Steinhauer, H. Joe, Andersson, Jonas, Sjögren, Anders

arXiv.org Artificial IntelligenceOct-7-2025

Deep neural networks (DNNs) are increasingly applied to safety-critical tasks in resource-constrained environments, such as video-based driver action and intention recognition. While last layer probabilistic deep learning (LL-PDL) methods can detect out-of-distribution (OOD) instances, their performance varies. As an alternative to last layer approaches, we propose extending pre-trained DNNs with transformation layers to produce multiple latent representations to estimate the uncertainty. W e evaluate our latent uncertainty representation (LUR) and repulsively trained LUR (RLUR) approaches against eight PDL methods across four video-based driver action and intention recognition datasets, comparing classification performance, calibration, and uncertainty-based OOD detection. W e also contribute 28,000 frame-level action labels and 1,194 video-level intention labels for the NuScenes dataset. Our results show that LUR and RLUR achieve comparable in-distribution classification performance to other LL-PDL approaches. F or uncertainty-based OOD detection, LUR matches top-performing PDL methods while being more efficient to train and easier to tune than approaches that require Markov-Chain Monte Carlo sampling or repulsive training procedures.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2510.05006

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

RFCAudit: An LLM Agent for Functional Bug Detection in Network Protocols

Zheng, Mingwei, Wang, Chengpeng, Liu, Xuwei, Guo, Jinyao, Feng, Shiwei, Zhang, Xiangyu

arXiv.org Artificial IntelligenceOct-7-2025

Functional correctness is critical for ensuring the reliability and security of network protocol implementations. Functional bugs, instances where implementations diverge from behaviors specified in RFC documents, can lead to severe consequences, including faulty routing, authentication bypasses, and service disruptions. Detecting these bugs requires deep semantic analysis across specification documents and source code, a task beyond the capabilities of traditional static analysis tools. This paper introduces RFCAudit, an autonomous agent that leverages large language models (LLMs) to detect functional bugs by checking conformance between network protocol implementations and their RFC specifications. Inspired by the human auditing procedure, RFCAudit comprises two key components: an indexing agent and a detection agent. The former hierarchically summarizes protocol code semantics, generating semantic indexes that enable the detection agent to narrow down the scanning scope. The latter employs demand-driven retrieval to iteratively collect additional relevant data structures and functions, eventually identifying potential inconsistencies with the RFC specifications effectively. We evaluate RFCAudit across six real-world network protocol implementations. RFCAudit identifies 47 functional bugs with 81.9% precision, of which 20 bugs have been confirmed or fixed by developers.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.00714

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models

Tian, Runchu, Cui, Junxia, Xu, Xueqiang, Yao, Feng, Shang, Jingbo

arXiv.org Artificial IntelligenceOct-7-2025

Diffusion large language models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) models, offering advantages such as accelerated parallel decoding and bidirectional context modeling. However, the vanilla decoding strategy in discrete dLLMs suffers from a critical limitation: once a token is accepted, it can no longer be revised in subsequent steps. As a result, early mistakes persist across iterations, harming both intermediate predictions and final output quality. To address this issue, we propose Tolerator (Token-Level Cross-Validation Refinement), a training-free decoding strategy that leverages cross-validation among predicted tokens. Unlike existing methods that follow a single progressive unmasking procedure, Tolerator introduces a two-stage process: (i) sequence fill-up and (ii) iterative refinement by remasking and decoding a subset of tokens while treating the remaining as context. This design enables previously accepted tokens to be reconsidered and corrected when necessary, leading to more reliable diffusion decoding outputs. We evaluate Tolerator on five standard benchmarks covering language understanding, code generation, and mathematics. Experiments show that our method achieves consistent improvements over the baselines under the same computational budget. These findings suggest that decoding algorithms are crucial to realizing the full potential of diffusion large language models. Code and data are publicly available.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.0509

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Federated Computation of ROC and PR Curves

Xu, Xuefeng, Cormode, Graham

arXiv.org Artificial IntelligenceOct-7-2025

Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are fundamental tools for evaluating machine learning classifiers, offering detailed insights into the trade-offs between true positive rate vs. false positive rate (ROC) or precision vs. recall (PR). However, in Federated Learning (FL) scenarios, where data is distributed across multiple clients, computing these curves is challenging due to privacy and communication constraints. Specifically, the server cannot access raw prediction scores and class labels, which are used to compute the ROC and PR curves in a centralized setting. In this paper, we propose a novel method for approximating ROC and PR curves in a federated setting by estimating quantiles of the prediction score distribution under distributed differential privacy. We provide theoretical bounds on the Area Error (AE) between the true and estimated curves, demonstrating the trade-offs between approximation accuracy, privacy, and communication cost. Empirical results on real-world datasets demonstrate that our method achieves high approximation accuracy with minimal communication and strong privacy guarantees, making it practical for privacy-preserving model evaluation in federated systems.

artificial intelligence, machine learning, pr curve, (15 more...)

arXiv.org Artificial Intelligence

2510.04979

Country:

Europe (0.93)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.94)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Natural Language Edge Labelling: Decoupling Intent from Execution in Structured LM Reasoning

Madahar, Abhinav

arXiv.org Artificial IntelligenceOct-7-2025

Controllers for structured LM reasoning (e.g., Chain-of-Thought, self-consistency, and Tree-of-Thoughts) often entangle what to try next with how to execute it, exposing only coarse global knobs and yielding brittle, compute-inefficient, and hard-to-audit behavior. We introduce Natural Language Edge Labelling (NLEL), a labeller-tuner overlay that attaches a free-form natural-language directive to each search edge and translates it into a schema-bounded control vector for decoding, search (branch quotas, exploration $β$), generation bundle size, retrieval mixtures, and verification passes. A labeller $Λ$ emits labels from the parent state and a compact context; a tuner $Ψ$ maps $(P, L, C)\to Π$, with strict schema validation and trust-region projection around safe defaults. Downstream selection remains ToT-style with score $S=μ+βσ$ and depth-annealed $β$. We show NLEL strictly generalizes CoT/ToT, prove an anytime-monotonicity property for top-$k$ selection under label-conditioned bundles, and bound selector shortfall by control-vector distortion, providing decision-relevant justification for guards like trust regions and verification passes. We instantiate $Ψ$ as a prompt-only JSON Parameter Emitter and preregister an evaluation on GSM8K, MATH (subset), StrategyQA, and ARC-Challenge with compute-aware reporting (success@compute, tokens-per-success) and ablations over $Λ$, $Ψ$, trust-region radius, and control quantization; preregistered forecasts anticipate accuracy gains at comparable token budgets and improved success@compute under constraints. NLEL offers an interpretable, model-agnostic interface that separates intent from execution for controllable, auditable LM inference.

artificial intelligence, large language model, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2510.04817

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge

Kirchner, Max, Hoffmann, Hanna, Jenke, Alexander C., Saldanha, Oliver L., Pfeiffer, Kevin, Kanjo, Weam, Alekseenko, Julia, de Boer, Claas, Kolamuri, Santhi Raj, Mazza, Lorenzo, Padoy, Nicolas, Bano, Sophia, Reinke, Annika, Maier-Hein, Lena, Stoyanov, Danail, Kather, Jakob N., Kolbinger, Fiona R., Bodenstedt, Sebastian, Speidel, Stefanie

arXiv.org Artificial IntelligenceOct-7-2025

Purpose: The FedSurg challenge was designed to benchmark the state of the art in federated learning for surgical video classification. Its goal was to assess how well current methods generalize to unseen clinical centers and adapt through local fine-tuning while enabling collaborative model development without sharing patient data. Methods: Participants developed strategies to classify inflammation stages in appendicitis using a preliminary version of the multi-center Appendix300 video dataset. The challenge evaluated two tasks: generalization to an unseen center and center-specific adaptation after fine-tuning. Submitted approaches included foundation models with linear probing, metric learning with triplet loss, and various FL aggregation schemes (FedAvg, FedMedian, FedSAM). Performance was assessed using F1-score and Expected Cost, with ranking robustness evaluated via bootstrapping and statistical testing. Results: In the generalization task, performance across centers was limited. In the adaptation task, all teams improved after fine-tuning, though ranking stability was low. The ViViT-based submission achieved the strongest overall performance. The challenge highlighted limitations in generalization, sensitivity to class imbalance, and difficulties in hyperparameter tuning in decentralized training, while spatiotemporal modeling and context-aware preprocessing emerged as promising strategies. Conclusion: The FedSurg Challenge establishes the first benchmark for evaluating FL strategies in surgical video classification. Findings highlight the trade-off between local personalization and global robustness, and underscore the importance of architecture choice, preprocessing, and loss design. This benchmarking offers a reference point for future development of imbalance-aware, adaptive, and robust FL methods in clinical surgical AI.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.04772

Country:

North America > United States > Indiana (0.28)
Europe > Germany > Baden-Württemberg > Karlsruhe Region (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Health & Medicine > Therapeutic Area > Gastroenterology (0.85)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Predictive economics: Rethinking economic methodology with machine learning

Pereira, Miguel Alves

arXiv.org Artificial IntelligenceOct-7-2025

This article proposes predictive economics as a distinct analytical perspective within economics, grounded in machine learning and centred on predictive accuracy rather than causal identification. Drawing on the instrumentalist tradition (Friedman), the explanation-prediction divide (Shmueli), and the contrast between modelling cultures (Breiman), we formalise prediction as a valid epistemological and methodological objective. Reviewing recent applications across economic subfields, we show how predictive models contribute to empirical analysis, particularly in complex or data-rich contexts. This perspective complements existing approaches and supports a more pluralistic methodology - one that values out-of-sample performance alongside interpretability and theoretical structure. Keywords: Predictive economics, Machine learning, Forecasting, Causal inference, Economic methodology 1. Introduction The evolution of economics has long been shaped by advances in analytical tools.

artificial intelligence, economics, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.04726

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Banking & Finance > Economy (1.00)
Energy (0.69)
Government > Regional Government > North America Government > United States Government (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.36)

Add feedback