Overview
Stylometry recognizes human and LLM-generated texts in short samples
Przystalski, Karol, Argasiński, Jan K., Grabska-Gradzińska, Iwona, Ochab, Jeremi K.
The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode lexical, grammatical, syntactic, and punctuation patterns. The cross-validated results reached a performance of up to .87 Matthews correlation coefficient in the multiclass scenario with 7 classes, and accuracy between .79 and 1. in binary classification, with the particular example of Wikipedia and GPT-4 reaching up to .98 accuracy on a balanced dataset. Shapley Additive Explanations pinpointed features characteristic of the encyclopaedic text type, individual overused words, as well as a greater grammatical standardisation of LLMs with respect to human-written texts. These results show -- crucially, in the context of the increasingly sophisticated LLMs -- that it is possible to distinguish machine- from human-generated texts at least for a well-defined text type.
Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents
Yan, Xinyuan, Sevastjanova, Rita, van der Ben, Sinie, El-Assady, Mennatallah, Wang, Bei
Large language models (LLMs) produce high-dimensional embeddings that capture rich semantic and syntactic relationships between words, sentences, and concepts. Investigating the topological structures of LLM embedding spaces via mapper graphs enables us to understand their underlying structures. Specifically, a mapper graph summarizes the topological structure of the embedding space, where each node represents a topological neighborhood (containing a cluster of embeddings), and an edge connects two nodes if their corresponding neighborhoods overlap. However, manually exploring these embedding spaces to uncover encoded linguistic properties requires considerable human effort. To address this challenge, we introduce a framework for semi-automatic annotation of these embedding properties. To organize the exploration process, we first define a taxonomy of explorable elements within a mapper graph such as nodes, edges, paths, components, and trajectories. The annotation of these elements is executed through two types of customizable LLM-based agents that employ perturbation techniques for scalable and automated analysis. These agents help to explore and explain the characteristics of mapper elements and verify the robustness of the generated explanations. We instantiate the framework within a visual analytics workspace and demonstrate its effectiveness through case studies. In particular, we replicate findings from prior research on BERT's embedding properties across various layers of its architecture and provide further observations into the linguistic properties of topological neighborhoods.
Simulating multiple human perspectives in socio-ecological systems using large language models
Zeng, Yongchao, Brown, Calum, Kyriakou, Ioannis, Hotz, Ronja, Rounsevell, Mark
Understanding socio-ecological systems requires insights from diverse stakeholder perspectives, which are often hard to access. To enable alternative, simulation-based exploration of different stakeholder perspectives, we develop the HoPeS (Human-Oriented Perspective Shifting) modelling framework. HoPeS employs agents powered by large language models (LLMs) to represent various stakeholders; users can step into the agent roles to experience perspectival differences. A simulation protocol serves as a "scaffold" to streamline multiple perspective-taking simulations, supporting users in reflecting on, transitioning between, and integrating across perspectives. A prototype system is developed to demonstrate HoPeS in the context of institutional dynamics and land use change, enabling both narrative-driven and numerical experiments. In an illustrative experiment, a user successively adopts the perspectives of a system observer and a researcher - a role that analyses data from the embedded land use model to inform evidence-based decision-making for other LLM agents representing various institutions. Despite the user's effort to recommend technically sound policies, discrepancies persist between the policy recommendation and implementation due to stakeholders' competing advocacies, mirroring real-world misalignment between researcher and policymaker perspectives. The user's reflection highlights the subjective feelings of frustration and disappointment as a researcher, especially due to the challenge of maintaining political neutrality while attempting to gain political influence. Despite this, the user exhibits high motivation to experiment with alternative narrative framing strategies, suggesting the system's potential in exploring different perspectives. Further system and protocol refinement are likely to enable new forms of interdisciplinary collaboration in socio-ecological simulations.
HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study
Pitale, Mandar, Frtunikj, Jelena, Priyadershi, Abhinaw, Singh, Vasu, Spence, Maria
AI has become integral to safety-critical areas like autonomous driving systems (ADS) and robotics. The architecture of recent autonomous systems are trending toward end-to-end (E2E) monolithic architectures such as large language models (LLMs) and vision language models (VLMs). In this paper, we review different architectural solutions and then evaluate the efficacy of common safety analyses such as failure modes and effect analysis (FMEA) and fault tree analysis (FTA). We show how these techniques can be improved for the intricate nature of the foundational models, particularly in how they form and utilize latent representations. We introduce HySAFE-AI, Hybrid Safety Architectural Analysis Framework for AI Systems, a hybrid framework that adapts traditional methods to evaluate the safety of AI systems. Lastly, we offer hints of future work and suggestions to guide the evolution of future AI safety standards.
VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
Giahi, Ramin, Yao, Kehui, Kollipara, Sriram, Zhao, Kai, Mirjalili, Vahid, Xu, Jianpeng, Biswas, Topojoy, Korpeoglu, Evren, Achan, Kannan
Multimodal learning plays a critical role in e-commerce recommendation platforms today, enabling accurate recommendations and product understanding. However, existing vision-language models, such as CLIP, face key challenges in e-commerce recommendation systems: 1) Weak object-level alignment, where global image embeddings fail to capture fine-grained product attributes, leading to suboptimal retrieval performance; 2) Ambiguous textual representations, where product descriptions often lack contextual clarity, affecting cross-modal matching; and 3) Domain mismatch, as generic vision-language models may not generalize well to e-commerce-specific data. To address these limitations, we propose a framework, VL-CLIP, that enhances CLIP embeddings by integrating Visual Grounding for fine-grained visual understanding and an LLM-based agent for generating enriched text embeddings. Visual Grounding refines image representations by localizing key products, while the LLM agent enhances textual features by disambiguating product descriptions. Our approach significantly improves retrieval accuracy, multimodal retrieval effectiveness, and recommendation quality across tens of millions of items on one of the largest e-commerce platforms in the U.S., increasing CTR by 18.6%, ATC by 15.5%, and GMV by 4.0%. Additional experimental results show that our framework outperforms vision-language models, including CLIP, FashionCLIP, and GCL, in both precision and semantic alignment, demonstrating the potential of combining object-aware visual grounding and LLM-enhanced text representation for robust multimodal recommendations.
AI-based Clinical Decision Support for Primary Care: A Real-World Study
Korom, Robert, Kiptinness, Sarah, Adan, Najib, Said, Kassim, Ithuli, Catherine, Rotich, Oliver, Kimani, Boniface, King'ori, Irene, Kamau, Stellah, Atemba, Elizabeth, Aden, Muna, Bowman, Preston, Sharman, Michael, Hicks, Rebecca Soskin, Distler, Rebecca, Heidecke, Johannes, Arora, Rahul K., Singhal, Karan
We evaluate the impact of large language model-based clinical decision support in live care. In partnership with Penda Health, a network of primary care clinics in Nairobi, Kenya, we studied AI Consult, a tool that serves as a safety net for clinicians by identifying potential documentation and clinical decision-making errors. AI Consult integrates into clinician workflows, activating only when needed and preserving clinician autonomy. We conducted a quality improvement study, comparing outcomes for 39,849 patient visits performed by clinicians with or without access to AI Consult across 15 clinics. Visits were rated by independent physicians to identify clinical errors. Clinicians with access to AI Consult made relatively fewer errors: 16% fewer diagnostic errors and 13% fewer treatment errors. In absolute terms, the introduction of AI Consult would avert diagnostic errors in 22,000 visits and treatment errors in 29,000 visits annually at Penda alone. In a survey of clinicians with AI Consult, all clinicians said that AI Consult improved the quality of care they delivered, with 75% saying the effect was "substantial". These results required a clinical workflow-aligned AI Consult implementation and active deployment to encourage clinician uptake. We hope this study demonstrates the potential for LLM-based clinical decision support tools to reduce errors in real-world settings and provides a practical framework for advancing responsible adoption.
Privacy-Preserving Multimodal News Recommendation through Federated Learning
Khalaj, Mehdi, Najafabadi, Shahrzad Golestani, Vassileva, Julita
Personalized News Recommendation systems (PNR) have emerged as a solution to information overload by predicting and suggesting news items tailored to individual user interests. However, traditional PNR systems face several challenges, including an overreliance on textual content, common neglect of short-term user interests, and significant privacy concerns due to centralized data storage. This paper addresses these issues by introducing a novel multimodal federated learning-based approach for news recommendation. First, it integrates both textual and visual features of news items using a multimodal model, enabling a more comprehensive representation of content. Second, it employs a time-aware model that balances users' long-term and short-term interests through multi-head self-attention networks, improving recommendation accuracy. Finally, to enhance privacy, a federated learning framework is implemented, enabling collaborative model training without sharing user data. The framework divides the recommendation model into a large server-maintained news model and a lightweight user model shared between the server and clients. The client requests news representations (vectors) and a user model from the central server, then computes gradients with user local data, and finally sends their locally computed gradients to the server for aggregation. The central server aggregates gradients to update the global user model and news model. The updated news model is further used to infer news representation by the server. To further safeguard user privacy, a secure aggregation algorithm based on Shamir's secret sharing is employed. Experiments on a real-world news dataset demonstrate strong performance compared to existing systems, representing a significant advancement in privacy-preserving personalized news recommendation.
Citation Recommendation using Deep Canonical Correlation Analysis
McNamara, Conor, Ramlan, Effirul
Recent advances in citation recommendation have improved accuracy by leveraging multi-view representation learning to integrate the various modalities present in scholarly documents. However, effectively combining multiple data views requires fusion techniques that can capture complementary information while preserving the unique characteristics of each modality. We propose a novel citation recommendation algorithm that improves upon linear Canonical Correlation Analysis (CCA) methods by applying Deep CCA (DCCA), a neural network extension capable of capturing complex, non-linear relationships between distributed textual and graph-based representations of scientific articles. Experiments on the large-scale DBLP (Digital Bibliography & Library Project) citation network dataset demonstrate that our approach outperforms state-of-the-art CCA-based methods, achieving relative improvements of over 11% in Mean Average Precision@10, 5% in Precision@10, and 7% in Recall@10. These gains reflect more relevant citation recommendations and enhanced ranking quality, suggesting that DCCA's non-linear transformations yield more expressive latent representations than CCA's linear projections.
Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance
Parekh, Rishi, Gopalakrishnan, Saisubramaniam, Ahmad, Zishan, Deodhar, Anirudh
Analyzing large, complex output datasets from Discrete Event Simulations (DES) of warehouse operations to identify bottlenecks and inefficiencies is a critical yet challenging task, often demanding significant manual effort or specialized analytical tools. Our framework integrates Knowledge Graphs (KGs) and Large Language Model (LLM)-based agents to analyze complex Discrete Event Simulation (DES) output data from warehouse operations. It transforms raw DES data into a semantically rich KG, capturing relationships between simulation events and entities. An LLM-based agent uses iterative reasoning, generating interdependent sub-questions. For each sub-question, it creates Cypher queries for KG interaction, extracts information, and self-reflects to correct errors. This adaptive, iterative, and self-correcting process identifies operational issues mimicking human analysis. Our DES approach for warehouse bottleneck identification, tested with equipment breakdowns and process irregularities, outperforms baseline methods. For operational questions, it achieves near-perfect pass rates in pinpointing inefficiencies. For complex investigative questions, we demonstrate its superior diagnostic ability to uncover subtle, interconnected issues. This work bridges simulation modeling and AI (KG+LLM), offering a more intuitive method for actionable insights, reducing time-to-insight, and enabling automated warehouse inefficiency evaluation and diagnosis.
A Low-Cost Machine Learning Approach for Timber Diameter Estimation
Fard, Fatemeh Hasanzadeh, Fard, Sanaz Hasanzadeh, Jonoobi, Mehdi
The wood processing industry, particularly in facilities such as sawmills and MDF production lines, requires accurate and efficient identification of species and thickness of the wood. Although traditional methods rely heavily on expert human labor, they are slow, inconsistent, and prone to error, especially when processing large volumes. This study focuses on practical and cost-effective machine learning frameworks that automate the estimation of timber log diameter using standard RGB images captured under real-world working conditions. We employ the YOLOv5 object detection algorithm, fine-tuned on a public dataset (TimberSeg 1.0), to detect individual timber logs and estimate thickness through bounding-box dimensions. Unlike previous methods that require expensive sensors or controlled environments, this model is trained on images taken in typical industrial sheds during timber delivery. Experimental results show that the model achieves a mean Average Precision (mAP@0.5) of 0.64, demonstrating reliable log detection even with modest computing resources. This lightweight, scalable solution holds promise for practical integration into existing workflows, including on-site inventory management and preliminary sorting, particularly in small and medium-sized operations.