AITopics

The existing Multimodal Large Language Models (MLLMs) for GUI perception have made great progress. However, the following challenges still exist in prior methods: 1) They model discrete coordinates based on text autoregressive mechanism, which results in lower grounding accuracy and slower inference speed. 2) They can only locate predefined sets of elements and are not capable of parsing the entire interface, which hampers the broad application and support for downstream tasks. To address the above issues, we propose SparkUI-Parser, a novel end-to-end framework where higher localization precision and fine-grained parsing capability of the entire interface are simultaneously achieved. Specifically, instead of using probability-based discrete modeling, we perform continuous modeling of coordinates based on a pre-trained Multimodal Large Language Model (MLLM) with an additional token router and coordinate decoder. This effectively mitigates the limitations inherent in the discrete output characteristics and the token-by-token generation process of MLLMs, consequently boosting both the accuracy and the inference speed. To further enhance robustness, a rejection mechanism based on a modified Hungarian matching algorithm is introduced, which empowers the model to identify and reject non-existent elements, thereby reducing false positives. Moreover, we present ScreenParse, a rigorously constructed benchmark to systematically assess structural perception capabilities of GUI models across diverse scenarios. Extensive experiments demonstrate that our approach consistently outperforms SOTA methods on ScreenSpot, ScreenSpot-v2, CAGUI-Grounding and ScreenParse benchmarks. The resources are available at https://github.com/antgroup/SparkUI-Parser.

large language model, machine learning, natural language, (17 more...)

2509.04908

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Zhang, Zhihao, Peng, Chengyang, Yurtsever, Ekim, Redmill, Keith A.

Bootstrapping Reinforcement Learning with Sub-optimal Policies for Autonomous Driving

Automated vehicle control using reinforcement learning (RL) has attracted significant attention due to its potential to learn driving policies through environment interaction. However, RL agents often face training challenges in sample efficiency and effective exploration, making it difficult to discover an optimal driving strategy. To address these issues, we propose guiding the RL driving agent with a demonstration policy that need not be a highly optimized or expert-level controller. Specifically, we integrate a rule-based lane change controller with the Soft Actor Critic (SAC) algorithm to enhance exploration and learning efficiency. Our approach demonstrates improved driving performance and can be extended to other driving scenarios that can similarly benefit from demonstration-based guidance.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2509.04712

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Kalai, Adam Tauman, Nachum, Ofir, Vempala, Santosh S., Zhang, Edwin

Why Language Models Hallucinate

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline. Hallucinations need not be mysterious -- they originate simply as errors in binary classification. If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures. We then argue that hallucinations persist due to the way most evaluations are graded -- language models are optimized to be good test-takers, and guessing when uncertain improves test performance. This "epidemic" of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.

large language model, machine learning, natural language, (18 more...)

2509.04664

Country:

North America > United States > California > Santa Clara County (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.64)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Sinha, Ashutosh Kumar, Patel, Ayush, Dudhat, Mitul, Anand, Pritam, Mishra, Rahul

i-Mask: An Intelligent Mask for Breath-Driven Activity Recognition

Human activity recognition (HAR) has gained significant attention due to its applications in health monitoring, intelligent environments, and human-computer interaction Hussain, Khan, Khan, Bhatt, Farouk, Bhola, and Baik (2024); Mishra, Gupta, Gupta, and Dutta (2022). Traditional HAR approaches employed wearable inertial sensors, vision-based methods, and environmental sensors for HAR. However, each method has inherent limitations such as discomfort, privacy concerns, or complex deployment requirements Wang, Huang, Zhao, Zhu, Huang, and Wu (2024); Mishra and Gupta (2025). The human body engages with its environment in diverse ways, one of which is the interaction between the lungs and the external environment through the act of breathing via the nose. The breathing pattern encompasses plenty of useful information that can be processed to fetch different behaviours and health information Mongelli, Orani, Cambiaso, Vaccari, Paglialonga, Braido, and Catalano (2020); Zhang, Wang, and Li (2024). Moreover, the breathing patterns are influenced by metabolic and physiological factors, offering a non-invasive and unobtrusive means of HAR.

data mining, machine learning, real time system, (20 more...)

2509.04544

Country: Asia > India (0.68)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (0.87)
Information Technology > Smart Houses & Appliances (0.68)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Architecture > Real Time Systems (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(4 more...)

Dunn, Oliver, Aslansefat, Koorosh, Papadopoulos, Yiannis

Q-SafeML: Safety Assessment of Quantum Machine Learning via Quantum Distance Metrics

The rise of machine learning in safety-critical systems has paralleled advancements in quantum computing, leading to the emerging field of Quantum Machine Learning (QML). While safety monitoring has progressed in classical ML, existing methods are not directly applicable to QML due to fundamental differences in quantum computation. Given the novelty of QML, dedicated safety mechanisms remain underdeveloped. This paper introduces Q-SafeML, a safety monitoring approach for QML. The method builds on SafeML, a recent method that utilizes statistical distance measures to assess model accuracy and provide confidence in the reasoning of an algorithm. An adapted version of Q-SafeML incorporates quantum-centric distance measures, aligning with the probabilistic nature of QML outputs. This shift to a model-dependent, post-classification evaluation represents a key departure from classical SafeML, which is dataset-driven and classifier-agnostic. The distinction is motivated by the unique representational constraints of quantum systems, requiring distance metrics defined over quantum state spaces. Q-SafeML detects distances between operational and training data addressing the concept drifts in the context of QML. Experiments on QCNN and VQC Models show that this enables informed human oversight, enhancing system transparency and safety.

artificial intelligence, machine learning, safeml, (17 more...)

2509.04536

Country:

Europe > United Kingdom (0.46)
Europe > Germany (0.29)

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Hierarchical Section Matching Prediction (HSMP) BERT for Fine-Grained Extraction of Structured Data from Hebrew Free-Text Radiology Reports in Crohn's Disease

Badash, Zvi, Ben-Atya, Hadas, Gavrielov, Naama, Hazan, Liam, Focht, Gili, Cytter-Kuint, Ruth, Hagopian, Talar, Turner, Dan, Freiman, Moti

Extracting structured clinical information from radiology reports is challenging, especially in low-resource languages. This is pronounced in Crohn's disease, with sparsely represented multi-organ findings. We developed Hierarchical Structured Matching Prediction BERT (HSMP-BERT), a prompt-based model for extraction from Hebrew radiology text. In an administrative database study, we analyzed 9,683 reports from Crohn's patients imaged 2010-2023 across Israeli providers. A subset of 512 reports was radiologist-annotated for findings across six gastrointestinal organs and 15 pathologies, yielding 90 structured labels per subject. Multilabel-stratified split (66% train+validation; 33% test), preserving label prevalence. Performance was evaluated with accuracy, F1, Cohen's $κ$, AUC, PPV, NPV, and recall. On 24 organ-finding combinations with $>$15 positives, HSMP-BERT achieved mean F1 0.83$\pm$0.08 and $κ$ 0.65$\pm$0.17, outperforming the SMP zero-shot baseline (F1 0.49$\pm$0.07, $κ$ 0.06$\pm$0.07) and standard fine-tuning (F1 0.30$\pm$0.27, $κ$ 0.27$\pm$0.34; paired t-test $p < 10^{-7}$). Hierarchical inference cuts runtime 5.1$\times$ vs. traditional inference. Applied to all reports, it revealed associations among ileal wall thickening, stenosis, and pre-stenotic dilatation, plus age- and sex-specific trends in inflammatory findings. HSMP-BERT offers a scalable solution for structured extraction in radiology, enabling population-level analysis of Crohn's disease and demonstrating AI's potential in low-resource settings.

bioinformatics, large language model, machine learning, (17 more...)

2509.04519

Country: Asia > Middle East > Israel (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Materazzini, Michele, Morciano, Gianluca, Alcalde-Llergo, Jose Manuel, Yeguas-Bolivar, Enrique, Calabro, Giuseppe, Zingoni, Andrea, Taborri, Juri

Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach

This study explores the use of virtual reality (VR) and artificial intelligence (AI) to predict the presence of dyslexia in Italian and Spanish university students. In particular, the research investigates whether VR-derived data from Silent Reading (SR) tests and self-esteem assessments can differentiate between students that are affected by dyslexia and students that are not, employing machine learning (ML) algorithms. Participants completed VR-based tasks measuring reading performance and self-esteem. A preliminary statistical analysis (t tests and Mann Whitney tests) on these data was performed, to compare the obtained scores between individuals with and without dyslexia, revealing significant differences in completion time for the SR test, but not in accuracy, nor in self esteem. Then, supervised ML models were trained and tested, demonstrating an ability to classify the presence/absence of dyslexia with an accuracy of 87.5 per cent for Italian, 66.6 per cent for Spanish, and 75.0 per cent for the pooled group. These findings suggest that VR and ML can effectively be used as supporting tools for assessing dyslexia, particularly by capturing differences in task completion speed, but language-specific factors may influence classification accuracy.

artificial intelligence, human computer interaction, machine learning, (19 more...)

doi: 10.3390/info16090719

2509.0451

Country: Europe > Italy (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Can Multiple Responses from an LLM Reveal the Sources of Its Uncertainty?

Nan, Yang, He, Pengfei, Tandon, Ravi, Xu, Han

Large language models (LLMs) have delivered significant breakthroughs across diverse domains but can still produce unreliable or misleading outputs, posing critical challenges for real-world applications. While many recent studies focus on quantifying model uncertainty, relatively little work has been devoted to \textit{diagnosing the source of uncertainty}. In this study, we show that, when an LLM is uncertain, the patterns of disagreement among its multiple generated responses contain rich clues about the underlying cause of uncertainty. To illustrate this point, we collect multiple responses from a target LLM and employ an auxiliary LLM to analyze their patterns of disagreement. The auxiliary model is tasked to reason about the likely source of uncertainty, such as whether it stems from ambiguity in the input question, a lack of relevant knowledge, or both. In cases involving knowledge gaps, the auxiliary model also identifies the specific missing facts or concepts contributing to the uncertainty. In our experiment, we validate our framework on AmbigQA, OpenBookQA, and MMLU-Pro, confirming its generality in diagnosing distinct uncertainty sources. Such diagnosis shows the potential for relevant manual interventions that improve LLM performance and reliability.

large language model, machine learning, natural language, (18 more...)

2509.04464

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection

Chen, Yihan, Chen, Jiawei, Mo, Guozhao, Chen, Xuanang, He, Ben, Han, Xianpei, Sun, Le

The growing integration of large language models (LLMs) into the peer review process presents potential risks to the fairness and reliability of scholarly evaluation. While LLMs offer valuable assistance for reviewers with language refinement, there is growing concern over their use to generate substantive review content. Existing general AI-generated text detectors are vulnerable to paraphrasing attacks and struggle to distinguish between surface language refinement and substantial content generation, suggesting that they primarily rely on stylistic cues. When applied to peer review, this limitation can result in unfairly suspecting reviews with permissible AI-assisted language enhancement, while failing to catch deceptively humanized AI-generated reviews. To address this, we propose a paradigm shift from style-based to content-based detection. Specifically, we introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews, covering six distinct modes of human-AI collaboration. Furthermore, we develop CoCoDet, an AI review detector via a multi-task learning framework, designed to achieve more accurate and robust detection of AI involvement in review content. Our work offers a practical foundation for evaluating the use of LLMs in peer review, and contributes to the development of more precise, equitable, and reliable detection methods for real-world scholarly applications.

detector, large language model, machine learning, (21 more...)

2509.0446

Country:

North America > United States (0.46)
North America > Mexico (0.28)

Genre:

Overview (0.87)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model

Li, Yu, Chen, Zulong, Xu, Wenjian, Wen, Hong, Yu, Yipeng, Yiu, Man Lung, Yin, Yuyu

To maintain the company's talent pool, recruiters need to continuously search for resumes from third-party websites (e.g., LinkedIn, Indeed). However, fetched resumes are often incomplete and inaccurate. To improve the quality of third-party resumes and enrich the company's talent pool, it is essential to conduct duplication detection between the fetched resumes and those already in the company's talent pool. Such duplication detection is challenging due to the semantic complexity, structural heterogeneity, and information incompleteness of resume texts. To this end, we propose MHSNet, an multi-level identity verification framework that fine-tunes BGE-M3 using contrastive learning. With the fine-tuned , Mixture-of-Experts (MoE) generates multi-level sparse and dense representations for resumes, enabling the computation of corresponding multi-level semantic similarities. Moreover, the state-aware Mixture-of-Experts (MoE) is employed in MHSNet to handle diverse incomplete resumes. Experimental results verify the effectiveness of MHSNet

artificial intelligence, machine learning, natural language, (17 more...)

2508.13676

Country: Asia > China > Zhejiang Province (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)