AITopics

The growing integration of large language models (LLMs) into the peer review process presents potential risks to the fairness and reliability of scholarly evaluation. While LLMs offer valuable assistance for reviewers with language refinement, there is growing concern over their use to generate substantive review content. Existing general AI-generated text detectors are vulnerable to paraphrasing attacks and struggle to distinguish between surface language refinement and substantial content generation, suggesting that they primarily rely on stylistic cues. When applied to peer review, this limitation can result in unfairly suspecting reviews with permissible AI-assisted language enhancement, while failing to catch deceptively humanized AI-generated reviews. To address this, we propose a paradigm shift from style-based to content-based detection. Specifically, we introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews, covering six distinct modes of human-AI collaboration. Furthermore, we develop CoCoDet, an AI review detector via a multi-task learning framework, designed to achieve more accurate and robust detection of AI involvement in review content. Our work offers a practical foundation for evaluating the use of LLMs in peer review, and contributes to the development of more precise, equitable, and reliable detection methods for real-world scholarly applications.

detector, large language model, machine learning, (21 more...)

2509.0446

Country:

North America > United States (0.46)
North America > Mexico (0.28)

Genre:

Overview (0.87)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Dutta, Anandi, Mruthyunjaya, Shivani, Saddington, Jessica, Islam, Kazi Sifatul

Mentalic Net: Development of RAG-based Conversational AI and Evaluation Framework for Mental Health Support

The emergence of large language models (LLMs) has unlocked boundless possibilities, along with significant challenges. In response, we developed a mental health support chatbot designed to augment professional healthcare, with a strong emphasis on safe and meaningful application. Our approach involved rigorous evaluation, covering accuracy, empathy, trustworthiness, privacy, and bias. We employed a retrieval-augmented generation (RAG) framework, integrated prompt engineering, and fine-tuned a pre-trained model on novel datasets. The resulting system, Mentalic Net Conversational AI, achieved a BERT Score of 0.898, with other evaluation metrics falling within satisfactory ranges. We advocate for a human-in-the-loop approach and a long-term, responsible strategy in developing such transformative technologies, recognizing both their potential to change lives and the risks they may pose if not carefully managed.

large language model, machine learning, natural language, (18 more...)

2509.04456

Country: North America > United States (1.00)

Genre:

Research Report (0.50)
Overview (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kabongo, Ben, Guigue, Vincent, Lemberger, Pirmin

ELIXIR: Efficient and LIghtweight model for eXplaIning Recommendations

Collaborative filtering drives many successful recommender systems but struggles with fine-grained user-item interactions and explainability. As users increasingly seek transparent recommendations, generating textual explanations through language models has become a critical research area. Existing methods employ either RNNs or Transformers. However, RNN-based approaches fail to leverage the capabilities of pre-trained Transformer models, whereas Transformer-based methods often suffer from suboptimal adaptation and neglect aspect modeling, which is crucial for personalized explanations. We propose ELIXIR (Efficient and LIghtweight model for eXplaIning Recommendations), a multi-task model combining rating prediction with personalized review generation. ELIXIR jointly learns global and aspect-specific representations of users and items, optimizing overall rating, aspect-level ratings, and review generation, with personalized attention to emphasize aspect importance. Based on a T5-small (60M) model, we demonstrate the effectiveness of our aspect-based architecture in guiding text generation in a personalized context, where state-of-the-art approaches exploit much larger models but fail to match user preferences as well. Experimental results on TripAdvisor and RateBeer demonstrate that ELIXIR significantly outperforms strong baseline models, especially in review generation.

large language model, machine learning, natural language, (18 more...)

2508.20312

Country:

Europe (0.70)
North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Zeng, Zhiyuan, Liu, Jiashuo, Chen, Siyuan, He, Tianci, Liao, Yali, Tian, Yixiao, Wang, Jinpeng, Wang, Zaiyuan, Yang, Yang, Yin, Lingyue, Yin, Mingren, Zhu, Zhenwei, Cai, Tianle, Chen, Zehui, Chen, Jiecao, Du, Yantao, Gao, Xiang, Guo, Jiacheng, Hu, Liang, Jiao, Jianpeng, Li, Xiangsheng, Liu, Jingkai, Ni, Shuang, Wen, Zhoufutu, Zhang, Ge, Zhang, Kaiyuan, Zhou, Xin, Blanchet, Jose, Qiu, Xipeng, Wang, Mengdi, Huang, Wenhao

Future prediction is a complex task for LLM agents, requiring a high level of analytical thinking, information gathering, contextual understanding, and decision-making under uncertainty. Agents must not only gather and interpret vast amounts of dynamic information but also integrate diverse data sources, weigh uncertainties, and adapt predictions based on emerging trends, just as human experts do in fields like politics, economics, and finance. Despite its importance, no large-scale benchmark exists for evaluating agents on future prediction, largely due to challenges in handling real-time updates and retrieving timely, accurate answers. To address this, we introduce $\textbf{FutureX}$, a dynamic and live evaluation benchmark specifically designed for LLM agents performing future prediction tasks. FutureX is the largest and most diverse live benchmark for future prediction, supporting real-time daily updates and eliminating data contamination through an automated pipeline for question gathering and answer collection. We evaluate 25 LLM/agent models, including those with reasoning, search capabilities, and integration of external tools such as the open-source Deep Research Agent and closed-source Deep Research models. This comprehensive evaluation assesses agents' adaptive reasoning and performance in dynamic environments. Additionally, we provide in-depth analyses of agents' failure modes and performance pitfalls in future-oriented tasks, including the vulnerability to fake web pages and the temporal validity. Our goal is to establish a dynamic, contamination-free evaluation standard that drives the development of LLM agents capable of performing at the level of professional human analysts in complex reasoning and predictive thinking.

large language model, machine learning, natural language, (19 more...)

2508.11987

Country:

Asia > China (0.68)
North America > United States > California > Los Angeles County (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (0.87)

Industry:

Media > News (1.00)
Leisure & Entertainment > Sports (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Hatfaludi, Cosmin-Andrei, Serban, Alex

Foundational Models and Federated Learning: Survey, Taxonomy, Challenges and Practical Insights

Federated learning has the potential to unlock siloed data and distributed resources by enabling collaborative model training without sharing private data. As more complex foundational models gain widespread use, the need to expand training resources and integrate privately owned data grows as well. In this article, we explore the intersection of federated learning and foundational models, aiming to identify, categorize, and characterize technical methods that integrate the two paradigms. As a unified survey is currently unavailable, we present a literature survey structured around a novel taxonomy that follows the development life-cycle stages, along with a technical comparison of available methods. Additionally, we provide practical insights and guidelines for implementing and evolving these methods, with a specific focus on the healthcare domain as a case study, where the potential impact of federated learning and foundational models is considered significant. Our survey covers multiple intersecting topics, including but not limited to federated learning, self-supervised learning, fine-tuning, distillation, and transfer learning. Initially, we retrieved and reviewed a set of over 4,200 articles. This collection was narrowed to more than 250 thoroughly reviewed articles through inclusion criteria, featuring 42 unique methods. The methods were used to construct the taxonomy and enabled their comparison based on complexity, efficiency, and scalability. We present these results as a self-contained overview that not only summarizes the state of the field but also provides insights into the practical aspects of adopting, evolving, and integrating foundational models with federated learning.

artificial intelligence, federated learning, machine learning, (18 more...)

doi: 10.7717/peerj-cs.2993

2509.05142

Country: Europe > Romania (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Hill, Brennen, Parla, Surendra, Balabhadruni, Venkata Abhijeeth, Padmalayam, Atharv Prajod, Sharma, Sujay Chandra Shekara

Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs

The proliferation of Large Language Models (LLMs) has introduced critical security challenges, where adversarial actors can manipulate input prompts to cause significant harm and circumvent safety alignments. These prompt-based attacks exploit vulnerabilities in a model's design, training, and contextual understanding, leading to intellectual property theft, misinformation generation, and erosion of user trust. A systematic understanding of these attack vectors is the foundational step toward developing robust countermeasures. This paper presents a comprehensive literature survey of prompt-based attack methodologies, categorizing them to provide a clear threat model. By detailing the mechanisms and impacts of these exploits, this survey aims to inform the research community's efforts in building the next generation of secure LLMs that are inherently resistant to unauthorized distillation, fine-tuning, and editing.

large language model, machine learning, natural language, (21 more...)

2509.04615

Country:

Asia (0.28)
North America > United States > Wisconsin (0.16)

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

van Leeuwen, Tristan, Brune, Christoph, Carioni, Marcello

An invertible generative model for forward and inverse problems

arXiv.org Machine LearningSep-5-2025

We formulate the inverse problem in a Bayesian framework and aim to train a generative model that allows us to simulate (i.e., sample from the likelihood) and do inference (i.e., sample from the posterior). We review the use of triangular normalizing flows for conditional sampling in this context and show how to combine two such triangular maps (an upper and a lower one) in to one invertible mapping that can be used for simulation and inference. We work out several useful properties of this invertible generative model and propose a possible training loss for training the map directly. We illustrate the workings of this new approach to conditional generative modeling numerically on a few stylized examples.

inverse problem, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2509.0391

Country:

North America (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report (0.50)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Walker, Connor, Aslansefat, Koorosh, Akram, Mohammad Naveed, Papadopoulos, Yiannis

RAGuard: A Novel Approach for in-context Safe Retrieval Augmented Generation for LLMs

arXiv.org Machine LearningSep-5-2025

Accuracy and safety are paramount in Offshore Wind (OSW) maintenance, yet conventional Large Language Models (LLMs) often fail when confronted with highly specialised or unexpected scenarios. We introduce RAGuard, an enhanced Retrieval-Augmented Generation (RAG) framework that explicitly integrates safety-critical documents alongside technical manuals.By issuing parallel queries to two indices and allocating separate retrieval budgets for knowledge and safety, RAGuard guarantees both technical depth and safety coverage. We further develop a SafetyClamp extension that fetches a larger candidate pool, "hard-clamping" exact slot guarantees to safety. We evaluate across sparse (BM25), dense (Dense Passage Retrieval) and hybrid retrieval paradigms, measuring Technical Recall@K and Safety Recall@K. Both proposed extensions of RAG show an increase in Safety Recall@K from almost 0\% in RAG to more than 50\% in RAGuard, while maintaining Technical Recall above 60\%. These results demonstrate that RAGuard and SafetyClamp have the potential to establish a new standard for integrating safety assurance into LLM-powered decision support in critical maintenance contexts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2509.03768

Country: Europe > United Kingdom > England > East Yorkshire > Hull (0.04)

Genre:

Research Report > Promising Solution (0.50)
Overview > Innovation (0.40)
Research Report > New Finding (0.34)

Industry:

Information Technology (0.93)
Energy (0.89)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jones, Charles, Glocker, Ben

A Primer on Causal and Statistical Dataset Biases for Fair and Robust Image Analysis

arXiv.org Machine LearningSep-5-2025

Machine learning methods often fail when deployed in the real world. Worse still, they fail in high-stakes situations and across socially sensitive lines. These issues have a chilling effect on the adoption of machine learning methods in settings such as medical diagnosis, where they are arguably best-placed to provide benefits if safely deployed. In this primer, we introduce the causal and statistical structures which induce failure in machine learning methods for image analysis. We highlight two previously overlooked problems, which we call the \textit{no fair lunch} problem and the \textit{subgroup separability} problem. We elucidate why today's fair representation learning methods fail to adequately solve them and propose potential paths forward for the field.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2509.04295

Genre:

Research Report (0.64)
Overview (0.42)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Seisa, Achilleas Santi, Sankaranarayanan, Viswa Narayanan, Damigos, Gerasimos, Satpute, Sumeet Gajanan, Nikolakopoulos, George

Cloud-Assisted Remote Control for Aerial Robots: From Theory to Proof-of-Concept Implementation

arXiv.org Artificial IntelligenceSep-5-2025

Cloud robotics has emerged as a promising technology for robotics applications due to its advantages of offloading computationally intensive tasks, facilitating data sharing, and enhancing robot coordination. However, integrating cloud computing with robotics remains a complex challenge due to network latency, security concerns, and the need for efficient resource management. In this work, we present a scalable and intuitive framework for testing cloud and edge robotic systems. The framework consists of two main components enabled by containerized technology: (a) a containerized cloud cluster and (b) the containerized robot simulation environment. The system incorporates two endpoints of a User Datagram Protocol (UDP) tunnel, enabling bidirectional communication between the cloud cluster container and the robot simulation environment, while simulating realistic network conditions. To achieve this, we consider the use case of cloud-assisted remote control for aerial robots, while utilizing Linux-based traffic control to introduce artificial delay and jitter, replicating variable network conditions encountered in practical cloud-robot deployments.

application, artificial intelligence, container, (15 more...)

doi: 10.1109/CCGridW65158.2025.00032

2509.04095

Country: Europe (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)