Goto

Collaborating Authors

 semantic search


Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4

Makohon, Ivan, Najafi, Mohamad, Wu, Jian, Brochhausen, Mathias, Li, Yaohang

arXiv.org Artificial Intelligence

In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 and the 21st Century Cures Act of 2016. Clinical notes for patients' assessments, diagnoses, and treatments are captured in these EHRs in free-form text by physicians, who spend a considerable amount of time entering and editing them. Manually writing clinical notes takes a considerable amount of a doctor's valuable time, increasing the patient's waiting time and possibly delaying diagnoses. Large language models (LLMs) possess the ability to generate news articles that closely resemble human-written ones. We investigate the usage of Chain-of-Thought (CoT) prompt engineering to improve the LLM's response in clinical note generation. In our prompts, we use as input International Classification of Diseases (ICD) codes and basic patient information. We investigate a strategy that combines the traditional CoT with semantic search results to improve the quality of generated clinical notes. Additionally, we infuse a knowledge graph (KG) built from clinical ontology to further enrich the domain-specific knowledge of generated clinical notes. We test our prompting technique on six clinical cases from the CodiEsp test dataset using GPT-4 and our results show that it outperformed the clinical notes generated by standard one-shot prompts.


Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

Zheng, Yicong, McKee, Kevin L., Miconi, Thomas, Bugaud, Zacharie, van Gelderen, Mick, McCaleb, Jed

arXiv.org Artificial Intelligence

How to enable human-like long-term memory in large language models (LLMs) has been a central question for unlocking more general capabilities such as few-shot generalization. Existing memory frameworks and benchmarks focus on finding the optimal memory compression algorithm for higher performance in tasks that require recollection and sometimes further reasoning. However, such efforts have ended up building more human bias into the compression algorithm, through the search for the best prompts and memory architectures that suit specific benchmarks, rather than finding a general solution that would work on other data distributions. On the other hand, goal-directed search on uncompressed information could potentially exhibit superior performance because compression is lossy, and a predefined compression algorithm will not fit all raw data distributions. Here we present SUMER (Search in Uncompressed Memory via Experience Replay), an end-to-end reinforcement learning agent with verifiable reward (RLVR) that learns to use search tools to gather information and answer a target question. On the LoCoMo dataset for long-context conversation understanding, SUMER with Qwen2.5-7B-Instruct learned to use search tools and outperformed all other biased memory compression approaches and also the full-context baseline, reaching SOTA performance (43% gain over the prior best). We demonstrate that a simple search method applied to raw data outperforms goal-agnostic and biased compression algorithms in current long-context memory tasks, arguing for new paradigms and benchmarks that are more dynamic and autonomously scalable. Code for SUMER and all implemented baselines is publicly available at https://github.com/zycyc/SUMER.


MIMIC-\RNum{4}-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp for Risk Prediction

Wang, Jing, Niu, Xing, Zhang, Tong, Shen, Jie, Kim, Juyong, Weiss, Jeremy C.

arXiv.org Artificial Intelligence

A crucial component for clinical risk prediction is developing a reliable prediction model is collecting high-quality time series clinical events. In this work, we release such a dataset that consists of 22,588,586 Clinical Time Series events, which we term MIMIC-\RNum{4}-Ext-22MCTS. Our source data are discharge summaries selected from the well-known yet unstructured MIMIC-IV-Note \cite{Johnson2023-pg}. The general-purpose MIMIC-IV-Note pose specific challenges for our work: it turns out that the discharge summaries are too lengthy for typical natural language models to process, and the clinical events of interest often are not accompanied with explicit timestamps. Therefore, we propose a new framework that works as follows: 1) we break each discharge summary into manageably small text chunks; 2) we apply contextual BM25 and contextual semantic search to retrieve chunks that have a high potential of containing clinical events; and 3) we carefully design prompts to teach the recently released Llama-3.1-8B \cite{touvron2023llama} model to identify or infer temporal information of the chunks. The obtained dataset is informative and transparent that standard models fine-tuned on the dataset achieves significant improvements in healthcare applications. In particular, the BERT model fine-tuned based on our dataset achieves 10\% improvement in accuracy on medical question answering task, and 3\% improvement in clinical trial matching task compared with the classic BERT. The dataset is available at https://physionet.org/content/mimic-iv-ext-22mcts/1.0.0. The codebase is released at https://github.com/JingWang-RU/MIMIC-IV-Ext-22MCTS-Temporal-Clinical-Time-Series-Dataset.


Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering

Menon, Karthik, Haider, Batool Arhamna, Arham, Muhammad, Mehreen, Kanwal, Kadiyala, Ram Mohan Rao, Farooq, Hamza

arXiv.org Artificial Intelligence

This study introduces Query Attribute Modeling (QAM), a hybrid framework that enhances search precision and relevance by decomposing open text queries into structured metadata tags and semantic elements. QAM addresses traditional search limitations by automatically extracting metadata filters from free-form text queries, reducing noise and enabling focused retrieval of relevant items. Experimental evaluation using the Amazon Toys Reviews dataset (10,000 unique items with 40,000+ reviews and detailed product attributes) demonstrated QAM's superior performance, achieving a mean average precision at 5 (mAP@5) of 52.99\%. This represents significant improvement over conventional methods, including BM25 keyword search, encoder-based semantic similarity search, cross-encoder re-ranking, and hybrid search combining BM25 and semantic results via Reciprocal Rank Fusion (RRF). The results establish QAM as a robust solution for Enterprise Search applications, particularly in e-commerce systems.


SAMEP: A Secure Protocol for Persistent Context Sharing Across AI Agents

Masoor, Hari

arXiv.org Artificial Intelligence

Current AI agent architectures suffer from ephemeral memory limitations, preventing effective collaboration and knowledge sharing across sessions and agent boundaries. We introduce SAMEP (Secure Agent Memory Exchange Protocol), a novel framework that enables persistent, secure, and semantically searchable memory sharing among AI agents. Our protocol addresses three critical challenges: (1) persistent context preservation across agent sessions, (2) secure multi-agent collaboration with fine-grained access control, and (3) efficient semantic discovery of relevant historical context. SAMEP implements a distributed memory repository with vector-based semantic search, cryptographic access controls (AES-256-GCM), and standardized APIs compatible with existing agent communication protocols (MCP, A2A). We demonstrate SAMEP's effectiveness across diverse domains including multi-agent software development, healthcare AI with HIPAA compliance, and multi-modal processing pipelines. Experimental results show 73% reduction in redundant computations, 89% improvement in context relevance scores, and complete compliance with regulatory requirements including audit trail generation. SAMEP enables a new paradigm of persistent, collaborative AI agent ecosystems while maintaining security and privacy guarantees.


eLog analysis for accelerators: status and future outlook

Sulc, Antonin, Hellert, Thorsten, Reed, Aaron, Carpenter, Adam, Bien, Alex, Tennant, Chris, Bisegni, Claudio, Lersch, Daniel, Ratner, Daniel, Lawrence, David, McSpadden, Diana, Hoschouer, Hayden, John, Jason St., Britton, Thomas

arXiv.org Artificial Intelligence

This work demonstrates electronic logbook (eLog) systems leveraging modern AI-driven information retrieval capabilities at the accelerator facilities of Fermilab, Jefferson Lab, Lawrence Berkeley National Laboratory (LBNL), SLAC National Accelerator Laboratory. We evaluate contemporary tools and methodologies for information retrieval with Retrieval Augmented Generation (RAGs), focusing on operational insights and integration with existing accelerator control systems. The study addresses challenges and proposes solutions for state-of-the-art eLog analysis through practical implementations, demonstrating applications and limitations. We present a framework for enhancing accelerator facility operations through improved information accessibility and knowledge management, which could potentially lead to more efficient operations.


Talking to Data: Designing Smart Assistants for Humanities Databases

Sergeev, Alexander, Goloviznina, Valeriya, Melnichenko, Mikhail, Kotelnikov, Evgeny

arXiv.org Artificial Intelligence

Access to humanities research databases is often hindered by the limitations of traditional interaction formats, particularly in the methods of searching and response generation. This study introduces an LLM-based smart assistant designed to facilitate natural language communication with digital humanities data. The assistant, developed in a chatbot format, leverages the RAG approach and integrates state-of-the-art technologies such as hybrid search, automatic query generation, text-to-SQL filtering, semantic database search, and hyperlink insertion. To evaluate the effectiveness of the system, experiments were conducted to assess the response quality of various language models. The testing was based on the Prozhito digital archive, which contains diary entries from predominantly Russian-speaking individuals who lived in the 20th century. The chatbot is tailored to support anthropology and history researchers, as well as non-specialist users with an interest in the field, without requiring prior technical training. By enabling researchers to query complex databases with natural language, this tool aims to enhance accessibility and efficiency in humanities research. The study highlights the potential of Large Language Models to transform the way researchers and the public interact with digital archives, making them more intuitive and inclusive. Additional materials are presented in GitHub repository: https://github.com/alekosus/talking-to-data-intersys2025.


A Grounded Memory System For Smart Personal Assistants

Ocker, Felix, Deigmöller, Jörg, Smirnov, Pavel, Eggert, Julian

arXiv.org Artificial Intelligence

A wide variety of agentic AI applications - ranging from cognitive assistants for dementia patients to robotics - demand a robust memory system grounded in reality. In this paper, we propose such a memory system consisting of three components. First, we combine Vision Language Models for image captioning and entity disambiguation with Large Language Models for consistent information extraction during perception. Second, the extracted information is represented in a memory consisting of a knowledge graph enhanced by vector embeddings to efficiently manage relational information. Third, we combine semantic search and graph query generation for question answering via Retrieval Augmented Generation. We illustrate the system's working and potential using a real-world example.


Building Scalable AI-Powered Applications with Cloud Databases: Architectures, Best Practices and Performance Considerations

Bhupathi, Santosh

arXiv.org Artificial Intelligence

This paper explores how cloud-native databases enable AI-driven applications by leveraging purpose-built technologies such as vector databases (pgvector), graph databases (AWS Neptune), NoSQL stores (Amazon DocumentDB, DynamoDB), and relational cloud databases (Aurora MySQL and PostgreSQL). It presents architectural patterns for integrating AI workloads with cloud databases, including Retrieval-Augmented Generation (RAG) [1] with LLMs, real-time data pipelines, AI-driven query optimization, and embeddings-based search. Performance benchmarks, scalability considerations, and cost-efficient strategies are evaluated to guide the design of AI-enabled applications. Real-world case studies from industries such as healthcare, finance, and customer experience illustrate how enterprises utilize cloud databases to enhance AI capabilities while ensuring security, governance, and compliance with enterprise and regulatory standards. By providing a comprehensive analysis of AI and cloud database integration, this paper serves as a practical guide for researchers, architects, and enterprises to build next-generation AI applications that optimize performance, scalability, and cost efficiency in cloud environments.


Langformers: Unified NLP Pipelines for Language Models

Lamsal, Rabindra, Read, Maria Rodriguez, Karunasekera, Shanika

arXiv.org Artificial Intelligence

Transformer-based language models have revolutionized the field of natural language processing (NLP). However, using these models often involves navigating multiple frameworks and tools, as well as writing repetitive boilerplate code. This complexity can discourage non-programmers and beginners, and even slow down prototyping for experienced developers. To address these challenges, we introduce Langformers, an open-source Python library designed to streamline NLP pipelines through a unified, factory-based interface for large language model (LLM) and masked language model (MLM) tasks. Langformers integrates conversational AI, MLM pretraining, text classification, sentence embedding/reranking, data labelling, semantic search, and knowledge distillation into a cohesive API, supporting popular platforms such as Hugging Face and Ollama. Key innovations include: (1) task-specific factories that abstract training, inference, and deployment complexities; (2) built-in memory and streaming for conversational agents; and (3) lightweight, modular design that prioritizes ease of use. Documentation: https://langformers.com