AITopics

Technology: Information Technology > Artificial Intelligence (0.81)

Neural Information Processing SystemsFeb-12-2026, 04:25:55 GMT

45d4924460c37853d57885d8af0b8d5c-Paper-Conference.pdf

machine learning, natural language, target model, (20 more...)

Country:

Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
(2 more...)

arXiv.org Artificial IntelligenceDec-11-2025

An Electrocardiogram Multi-task Benchmark with Comprehensive Evaluations and Insightful Findings

Xu, Yuhao, Lu, Jiaying, Ding, Sirui, Cao, Defu, Hu, Xiao, Yang, Carl

In the process of patient diagnosis, non-invasive measurements are widely used due to their low risks and quick results. Electrocardiogram (ECG), as a non-invasive method to collect heart activities, is used to diagnose cardiac conditions. Analyzing the ECG typically requires domain expertise, which is a roadblock to applying artificial intelligence (AI) for healthcare. Through advances in self-supervised learning and foundation models, AI systems can now acquire and leverage domain knowledge without relying solely on human expertise. However, there is a lack of comprehensive analyses over the foundation models' performance on ECG. This study aims to answer the research question: "Are Foundation Models Useful for ECG Analysis?" To address it, we evaluate language/general time-series/ECG foundation models in comparison with time-series deep learning models. The experimental results show that general time-series/ECG foundation models achieve a top performance rate of 80%, indicating their effectiveness in ECG analysis. In-depth analyses and insights are provided along with comprehensive experimental results. This study highlights the limitations and potential of foundation models in advancing physiological waveform analysis. The data and code for this benchmark are publicly available at https://github.com/yuhaoxu99/ECGMultitasks-Benchmark.

large language model, machine learning, natural language, (15 more...)

2512.08954

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-3-2025

SurveyEval: Towards Comprehensive Evaluation of LLM-Generated Academic Surveys

Zhao, Jiahao, Zhang, Shuaixing, Xu, Nan, Wang, Lei

LLM-based automatic survey systems are transforming how users acquire information from the web by integrating retrieval, organization, and content synthesis into end-to-end generation pipelines. While recent works focus on developing new generation pipelines, how to evaluate such complex systems remains a significant challenge. To this end, we introduce SurveyEval, a comprehensive benchmark that evaluates automatically generated surveys across three dimensions: overall quality, outline coherence, and reference accuracy. We extend the evaluation across 7 subjects and augment the LLM-as-a-Judge framework with human references to strengthen evaluation-human alignment. Evaluation results show that while general long-text or paper-writing systems tend to produce lower-quality surveys, specialized survey-generation systems are able to deliver substantially higher-quality results. We envision SurveyEval as a scalable testbed to understand and improve automatic survey systems across diverse subjects and evaluation criteria.

large language model, natural language, surveyeval, (14 more...)

2512.02763

Country:

Asia > China (0.17)
North America > United States (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceNov-25-2025

Fine-Grained GRPO for Precise Preference Alignment in Flow Models

Zhou, Yujie, Ling, Pengyang, Bu, Jiazi, Wang, Yibin, Zang, Yuhang, Wang, Jiaqi, Niu, Li, Zhai, Guangtao

The incorporation of online reinforcement learning (RL) into diffusion and flow-based generative models has recently gained attention as a powerful paradigm for aligning model behavior with human preferences. By leveraging stochastic sampling via Stochastic Differential Equations (SDEs) during the denoising phase, these models can explore a variety of denoising trajectories, enhancing the exploratory capacity of RL. However, despite their ability to discover potentially high-reward samples, current approaches often struggle to effectively align with preferences due to the sparsity and narrowness of reward feedback. To overcome this limitation, we introduce a novel framework called Granular-GRPO (G$^2$RPO), which enables fine-grained and comprehensive evaluation of sampling directions in the RL training of flow models. Specifically, we propose a Singular Stochastic Sampling mechanism that supports step-wise stochastic exploration while ensuring strong correlation between injected noise and reward signals, enabling more accurate credit assignment to each SDE perturbation. Additionally, to mitigate the bias introduced by fixed-granularity denoising, we design a Multi-Granularity Advantage Integration module that aggregates advantages computed across multiple diffusion scales, resulting in a more robust and holistic assessment of sampling trajectories. Extensive experiments on various reward models, including both in-domain and out-of-domain settings, demonstrate that our G$^2$RPO outperforms existing flow-based GRPO baselines, highlighting its effectiveness and generalization capability.

arxiv preprint arxiv, machine learning, reinforcement learning, (18 more...)

2510.01982

Genre:

Research Report (0.82)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Schlinge, Philipp, Meinert, Steffen, Atzmueller, Martin

Comprehensive Evaluation of Prototype Neural Networks

arXiv.org Artificial IntelligenceNov-24-2025

Prototype models are an important method for explainable artificial intelligence (XAI) and interpretable machine learning. In this paper, we perform an in-depth analysis of a set of prominent prototype models including ProtoPNet, ProtoPool and PIPNet. For their assessment, we apply a comprehensive set of metrics. In addition to applying standard metrics from literature, we propose several new metrics to further complement the analysis of model interpretability. In our experimentation, we apply the set of prototype models on a diverse set of datasets including fine-grained classification, Non-IID settings and multi-label classification to further contrast the performance. Furthermore, we also provide our code as an open-source library (https://github.com/uos-sis/quanproto), which facilitates simple application of the metrics itself, as well as extensibility -- providing the option for easily adding new metrics and models.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.1007/s13218-025-00900-0

2507.06819

Country: North America (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-10-2025, 00:57:46 GMT

45d4924460c37853d57885d8af0b8d5c-Paper-Conference.pdf

dataset, lg-cav, target model, (15 more...)

Country:

Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
(2 more...)

Jahan, Israt, Laskar, Md Tahmid Rahman, Peng, Chun, Huang, Jimmy

Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks

arXiv.org Artificial IntelligenceJul-21-2025

This paper presents a comprehensive evaluation of cost-efficient Large Language Models (LLMs) for diverse biomedical tasks spanning both text and image modalities. We evaluated a range of closed-source and open-source LLMs on tasks such as biomedical text classification and generation, question answering, and multimodal image processing. Our experimental findings indicate that there is no single LLM that can consistently outperform others across all tasks. Instead, different LLMs excel in different tasks. While some closed-source LLMs demonstrate strong performance on specific tasks, their open-source counterparts achieve comparable results (sometimes even better), with additional benefits like faster inference and enhanced privacy. Our experimental results offer valuable insights for selecting models that are optimally suited for specific biomedical applications.

large language model, machine learning, natural language, (19 more...)

2507.14045

Country: North America > Canada > Ontario (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-27-2025, 16:25:59 GMT

Toward a Stable, Fair, and Comprehensive Evaluation of Object Hallucination in Large Vision-Language Models

artificial intelligence, hallucination, natural language, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (0.64)
Information Technology > Artificial Intelligence > Natural Language (0.64)

arXiv.org Artificial IntelligenceMay-23-2025

A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial Optimization

Feng, Shengyu, Sun, Weiwei, Li, Shanda, Talwalkar, Ameet, Yang, Yiming

Machine learning (ML) has demonstrated considerable potential in supporting model design and optimization for combinatorial optimization (CO) problems. However, much of the progress to date has been evaluated on small-scale, synthetic datasets, raising concerns about the practical effectiveness of ML-based solvers in real-world, large-scale CO scenarios. Additionally, many existing CO benchmarks lack sufficient training data, limiting their utility for evaluating data-driven approaches. To address these limitations, we introduce FrontierCO, a comprehensive benchmark that covers eight canonical CO problem types and evaluates 16 representative ML-based solvers--including graph neural networks and large language model (LLM) agents. FrontierCO features challenging instances drawn from industrial applications and frontier CO research, offering both realistic problem difficulty and abundant training data. Our empirical results provide critical insights into the strengths and limitations of current ML methods, helping to guide more robust and practically relevant advances at the intersection of machine learning and combinatorial optimization. Our data is available at https://huggingface.co/datasets/CO-Bench/FrontierCO.

large language model, machine learning, natural language, (13 more...)

2505.16952

Country:

North America > United States (0.68)
Asia (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)