AITopics | test result

Collaborating Authors

test result

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix for " CaMiT: ATime-Aware Car Model Dataset for Classification and Generation "

Neural Information Processing SystemsJun-18-2026, 21:43:21 GMT

Filtering Step Remaining Instances Raw images collected 7.5Mimages After deduplication and car detection 4.9M images Initial car bounding boxes 13.22M boxes After score/size thresholding 6.97M boxes After Qwen2.5-7B

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.69)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks > Manufacturer (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
(2 more...)

Add feedback

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

Yakefu, Adina, Xie, Bin, Xu, Chongyang, Zhang, Enwen, Zhou, Erjin, Jia, Fan, Yang, Haitao, Fan, Haoqiang, Zhang, Haowei, Peng, Hongyang, Tan, Jing, Huang, Junwen, Liu, Kai, Liu, Kaixin, Gu, Kefan, Zhang, Qinglun, Zhang, Ruitao, Huang, Saike, Cheng, Shen, Liu, Shuaicheng, Wang, Tiancai, Wang, Tiezhen, Sun, Wei, Tang, Wenbin, Wei, Yajun, Chen, Yang, Gui, Youqiang, Zhao, Yucheng, Ma, Yunchao, Wei, Yunfei, Yang, Yunhuan, Guo, Yutong, Chen, Ze, Du, Zhengyuan, Zhang, Ziheng, Liu, Ziming, Yan, Ziwei

arXiv.org Artificial IntelligenceOct-22-2025

Testing on real machines is indispensable for robotic control algorithms. In the context of learning-based algorithms, especially VLA models, demand for large-scale evaluation, i.e. testing a large number of models on a large number of tasks, is becoming increasingly urgent. However, doing this right is highly non-trivial, especially when scalability and reproducibility is taken into account. In this report, we describe our methodology for constructing RoboChallenge, an online evaluation system to test robotic control algorithms, and our survey of recent state-of-the-art VLA models using our initial benchmark Table30.

artificial intelligence, robot, tester, (15 more...)

arXiv.org Artificial Intelligence

2510.1795

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

Wang, Wei, Wu, Dong-Dong, Li, Ming, Zhang, Jingxiong, Niu, Gang, Sugiyama, Masashi

arXiv.org Artificial IntelligenceSep-30-2025

Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other hand, the problem settings and solutions of PU learning have different families, i.e., the one-sample and two-sample settings. However, existing evaluation protocols are heavily biased towards the one-sample setting and neglect the significant difference between them. We identify the internal label shift problem of unlabeled training data for the one-sample setting and propose a simple yet effective calibration approach to ensure fair comparisons within and across families. We hope our framework will provide an accessible, realistic, and fair environment for evaluating PU learning algorithms in the future.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2509.24228

Country:

North America (0.28)
Asia > Japan (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Prediction of Hospital Associated Infections During Continuous Hospital Stays

Datta, Rituparna, Kamruzzaman, Methun, Klein, Eili Y., Madden, Gregory R, Deng, Xinwei, Vullikanti, Anil, Bhattacharya, Parantapa

arXiv.org Artificial IntelligenceAug-20-2025

The US Centers for Disease Control and Prevention (CDC), in 2019, designated Methicillin-resistant Staphylococcus au-reus (MRSA) as a serious antimicrobial resistance threat. The risk of acquiring MRSA and suffering life-threatening consequences due to it remains especially high for hospitalized patients due to a unique combination of factors, including: co-morbid conditions, immunosuppression, and antibiotic use, and risk of contact with contaminated hospital workers and equipment. In this paper, we present a novel generative probabilistic model, GenHAI, for modeling sequences of MRSA test results outcomes for patients during a single hospitalization. This model can be used to answer many important questions from the perspectives of hospital administrators for mitigating the risk of MRSA infections. Our model is based on the probabilistic programming paradigm, and can be used to approximately answer a variety of predictive, causal, and counterfactual questions. We demonstrate the efficacy of our model by comparing it against discriminative and generative machine learning models using two real world datasets.

artificial intelligence, machine learning, probabilistic program, (17 more...)

arXiv.org Artificial Intelligence

2508.13561

Country: North America > United States (0.88)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)

Add feedback

Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree

Peng, Qi, Cui, Jialin, Xie, Jiayuan, Cai, Yi, Li, Qing

arXiv.org Artificial IntelligenceAug-6-2025

Large language models (LLMs) have shown great potential in the medical domain. However, existing models still fall short when faced with complex medical diagnosis task in the real world. This is mainly because they lack sufficient reasoning depth, which leads to information loss or logical jumps when processing a large amount of specialized medical data, leading to diagnostic errors. To address these challenges, we propose Tree-of-Reasoning (ToR), a novel multi-agent framework designed to handle complex scenarios. Specifically, ToR introduces a tree structure that can clearly record the reasoning path of LLMs and the corresponding clinical evidence. At the same time, we propose a cross-validation mechanism to ensure the consistency of multi-agent decision-making, thereby improving the clinical reasoning ability of multi-agents in complex medical scenarios. Experimental results on real-world medical data show that our framework can achieve better performance than existing baseline methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.03038

Country: Asia > China (0.29)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.95)
Health & Medicine > Therapeutic Area > Gastroenterology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Secure Multifaceted-RAG for Enterprise: Hybrid Knowledge Retrieval with Security Filtering

Byun, Grace, Lee, Shinsun, Choi, Nayoung, Choi, Jinho D.

arXiv.org Artificial IntelligenceJul-18-2025

Existing Retrieval-Augmented Generation (RAG) systems face challenges in enterprise settings due to limited retrieval scope and data security risks. When relevant internal documents are unavailable, the system struggles to generate accurate and complete responses. Additionally, using closed-source Large Language Models (LLMs) raises concerns about exposing proprietary information. To address these issues, we propose the Secure Multifaceted-RAG (SecMulti-RAG) framework, which retrieves not only from internal documents but also from two supplementary sources: pre-generated expert knowledge for anticipated queries and on-demand external LLM-generated knowledge. To mitigate security risks, we adopt a local open-source generator and selectively utilize external LLMs only when prompts are deemed safe by a filtering mechanism. This approach enhances completeness, prevents data leakage, and reduces costs. In our evaluation on a report generation task in the automotive industry, SecMulti-RAG significantly outperforms traditional RAG - achieving 79.3 to 91.9 percent win rates across correctness, richness, and helpfulness in LLM-based evaluation, and 56.3 to 70.4 percent in human evaluation. This highlights SecMulti-RAG as a practical and secure solution for enterprise RAG.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.13425

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction

Ren, Weijieying, Zhao, Tianxiang, Wang, Lei, Wang, Tianchun, Honavar, Vasant

arXiv.org Artificial IntelligenceJun-26-2025

Recent advances in Large Language Models (LLMs) have led to remarkable progresses in medical consultation. However, existing medical LLMs overlook the essential role of Electronic Health Records (EHR) and focus primarily on diagnosis recommendation, limiting their clinical applicability. We propose DiaLLM, the first medical LLM that integrates heterogeneous EHR data into clinically grounded dialogues, enabling clinical test recommendation, result interpretation, and diagnosis prediction to better align with real-world medical practice. To construct clinically grounded dialogues from EHR, we design a Clinical Test Reference (CTR) strategy that maps each clinical code to its corresponding description and classifies test results as "normal" or "abnormal". Additionally, DiaLLM employs a reinforcement learning framework for evidence acquisition and automated diagnosis. To handle the large action space, we introduce a reject sampling strategy to reduce redundancy and improve exploration efficiency. Furthermore, a confirmation reward and a class-sensitive diagnosis reward are designed to guide accurate diagnosis prediction. Extensive experimental results demonstrate that DiaLLM outperforms baselines in clinical test recommendation and diagnosis prediction.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.20059

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Converting Annotated Clinical Cases into Structured Case Report Forms

Ferrazzi, Pietro, Lavelli, Alberto, Magnini, Bernardo

arXiv.org Artificial IntelligenceJun-16-2025

Case Report Forms (CRFs) are largely used in medical research as they ensure accuracy, reliability, and validity of results in clinical studies. However, publicly available, wellannotated CRF datasets are scarce, limiting the development of CRF slot filling systems able to fill in a CRF from clinical notes. To mitigate the scarcity of CRF datasets, we propose to take advantage of available datasets annotated for information extraction tasks and to convert them into structured CRFs. We present a semi-automatic conversion methodology, which has been applied to the E3C dataset in two languages (English and Italian), resulting in a new, high-quality dataset for CRF slot filling. Through several experiments on the created dataset, we report that slot filling achieves 59.7% for Italian and 67.3% for English on a closed Large Language Models (zero-shot) and worse performances on three families of open-source models, showing that filling CRFs is challenging even for recent state-of-the-art LLMs. We release the datest at https://huggingface.co/collections/NLP-FBK/e3c-to-crf-67b9844065460cbe42f80166

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.11666

Country: Europe > Italy (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.89)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.69)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TestAgent: An Adaptive and Intelligent Expert for Human Assessment

Yu, Junhao, Zhuang, Yan, Sun, YuXuan, Gao, Weibo, Liu, Qi, Cheng, Mingyue, Huang, Zhenya, Chen, Enhong

arXiv.org Artificial IntelligenceJun-4-2025

Accurately assessing internal human states is key to understanding preferences, offering personalized services, and identifying challenges in real-world applications. Originating from psychometrics, adaptive testing has become the mainstream method for human measurement and has now been widely applied in education, healthcare, sports, and sociology. It customizes assessments by selecting the fewest test questions . However, current adaptive testing methods face several challenges. The mechanized nature of most algorithms leads to guessing behavior and difficulties with open-ended questions. Additionally, subjective assessments suffer from noisy response data and coarse-grained test outputs, further limiting their effectiveness. To move closer to an ideal adaptive testing process, we propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement. This is the first application of LLMs in adaptive testing. TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions. Experiments on psychological, educational, and lifestyle assessments show our approach achieves more accurate results with 20% fewer questions than state-of-the-art baselines, and testers preferred it in speed, smoothness, and other dimensions.

adaptive testing, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.03032

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Did faulty drug tests taint parole hearings? California is reviewing hundreds of denials

Los Angeles TimesMay-31-2025, 10:00:00 GMT

The California Department of Corrections and Rehabilitation is reviewing hundreds of state parole hearings to see if any inmates who were denied parole were rejected because of faulty drug tests. Nearly 6,000 drug tests in California prisons are believed to have yielded false positives between April and July last year, and attorneys for the Board of Parole are now conducting a review of inmate files to determine if any of them need to appear before the parole board again to be reconsidered, according to officials with CDCR. If any inmates were denied parole because of the faulty tests, they could be owed a new hearing before the parole board, said attorneys representing inmates affected by the defective drug tests. The review is already underway and will determine if "without the positive drug screening, there is sufficient evidence to support an incarcerated person's denial of parole," said CDCR spokesperson Emily Humpal in a statement. If there isn't enough evidence to support incarceration other than the drug test, a new hearing will be scheduled.

drug test, inmate, parole hearing, (12 more...)

Los Angeles Times

Country: North America > United States > California > Los Angeles County (0.05)

Industry:

Law Enforcement & Public Safety > Corrections (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.50)
Government > Regional Government > North America Government > United States Government (0.36)

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback