Goto

Collaborating Authors

 South America


Merging Clinical Knowledge into Large Language Models for Medical Research and Applications: A Survey

arXiv.org Artificial Intelligence

Clinical knowledge is the collection of information learned from studies on the causes, prognosis, diagnosis, and treatment of diseases. This type of knowledge can improve curing performances, and promote physical health. With the emergence of large language models (LLMs), medical artificial intelligence (medical AI), which aims to apply academic medical AI systems to real-world medical scenarios, has entered a new age of development, resulting in excellent works such as DoctorGPT and Pangu-Drug from academic and industrial researches. However, the field lacks a comprehensive compendium and comparison of building medical AI systems from academia and industry. Therefore, this survey focuses on the building paradigms of medical AI systems including the use of clinical databases, datasets, training pipelines, integrating medical knowledge graphs, system applications, and evaluation systems. We hope that this survey can help relevant practical researchers understand the current performance of academic models in various fields of healthcare, as well as the potential problems and future directions for implementing these scientific achievements.


TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external data sources to enhance factual correctness and domain coverage. Modern RAG pipelines rely on large datastores, leading to system challenges in latency-sensitive deployments, especially when limited GPU memory is available. To address these challenges, we propose TeleRAG, an efficient inference system that reduces RAG latency with minimal GPU memory requirements. The core innovation of TeleRAG is lookahead retrieval, a prefetching mechanism that anticipates required data and transfers it from CPU to GPU in parallel with LLM generation. By leveraging the modularity of RAG pipelines, the inverted file index (IVF) search algorithm and similarities between queries, TeleRAG optimally overlaps data movement and computation. Experimental results show that TeleRAG reduces end-to-end RAG inference latency by up to 1.72x on average compared to state-of-the-art systems, enabling faster, more memory-efficient deployments of advanced RAG applications.


A Fused Gromov-Wasserstein Approach to Subgraph Contrastive Learning

arXiv.org Artificial Intelligence

Self-supervised learning has become a key method for training deep learning models when labeled data is scarce or unavailable. While graph machine learning holds great promise across various domains, the design of effective pretext tasks for self-supervised graph representation learning remains challenging. Contrastive learning, a popular approach in graph self-supervised learning, leverages positive and negative pairs to compute a contrastive loss function. However, current graph contrastive learning methods often struggle to fully use structural patterns and node similarities. To address these issues, we present a new method called Fused Gromov Wasserstein Subgraph Contrastive Learning (FOSSIL). Our model integrates node-level and subgraph-level contrastive learning, seamlessly combining a standard node-level contrastive loss with the Fused Gromov-Wasserstein distance. This combination helps our method capture both node features and graph structure together. Importantly, our approach works well with both homophilic and heterophilic graphs and can dynamically create views for generating positive and negative pairs. Through extensive experiments on benchmark graph datasets, we show that FOSSIL outperforms or achieves competitive performance compared to current state-of-the-art methods.


Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

arXiv.org Artificial Intelligence

Large language models (LLMs) often exhibit abrupt emergent behavior, whereby new abilities arise at certain points during their training. This phenomenon, commonly referred to as a ''phase transition'', remains poorly understood. In this study, we conduct an integrative analysis of such phase transitions by examining three interconnected perspectives: the similarity between LLMs and the human brain, the internal states of LLMs, and downstream task performance. We propose a novel interpretation for the learning dynamics of LLMs that vary in both training data and architecture, revealing that three phase transitions commonly emerge across these models during training: (1) alignment with the entire brain surges as LLMs begin adhering to task instructions Brain Alignment and Instruction Following, (2) unexpectedly, LLMs diverge from the brain during a period in which downstream task accuracy temporarily stagnates Brain Detachment and Stagnation, and (3) alignment with the brain reoccurs as LLMs become capable of solving the downstream tasks Brain Realignment and Consolidation. These findings illuminate the underlying mechanisms of phase transitions in LLMs, while opening new avenues for interdisciplinary research bridging AI and neuroscience.


FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

arXiv.org Artificial Intelligence

Large language models (LLMs) encounter computational challenges during long-sequence inference, especially in the attention pre-filling phase, where the complexity grows quadratically with the prompt length. Previous efforts to mitigate these challenges have relied on fixed sparse attention patterns or identifying sparse attention patterns based on limited cases. However, these methods lacked the flexibility to efficiently adapt to varying input demands. In this paper, we introduce FlexPrefill, a Flexible sparse Pre-filling mechanism that dynamically adjusts sparse attention patterns and computational budget in real-time to meet the specific requirements of each input and attention head. The flexibility of our method is demonstrated through two key innovations: 1) Query-Aware Sparse Pattern Determination: By measuring Jensen-Shannon divergence, this component adaptively switches between query-specific diverse attention patterns and predefined attention patterns. 2) Cumulative-Attention Based Index Selection: This component dynamically selects query-key indexes to be computed based on different attention patterns, ensuring the sum of attention scores meets a predefined threshold. FlexPrefill adaptively optimizes the sparse pattern and sparse ratio of each attention head based on the prompt, enhancing efficiency in long-sequence inference tasks. Experimental results show significant improvements in both speed and accuracy over prior methods, providing a more flexible and efficient solution for LLM inference.


Regional climate projections using a deep-learning-based model-ranking and downscaling framework: Application to European climate zones

arXiv.org Artificial Intelligence

Accurate regional climate forecast calls for high-resolution downscaling of Global Climate Models (GCMs). This work presents a deep-learning-based multi-model evaluation and downscaling framework ranking 32 Coupled Model Intercomparison Project Phase 6 (CMIP6) models using a Deep Learning-TOPSIS (DL-TOPSIS) mechanism and so refines outputs using advanced deep-learning models. Using nine performance criteria, five K\"oppen-Geiger climate zones -- Tropical, Arid, Temperate, Continental, and Polar -- are investigated over four seasons. While TaiESM1 and CMCC-CM2-SR5 show notable biases, ranking results show that NorESM2-LM, GISS-E2-1-G, and HadGEM3-GC31-LL outperform other models. Four models contribute to downscaling the top-ranked GCMs to 0.1$^{\circ}$ resolution: Vision Transformer (ViT), Geospatial Spatiotemporal Transformer with Attention and Imbalance-Aware Network (GeoSTANet), CNN-LSTM, and CNN-Long Short-Term Memory (ConvLSTM). Effectively capturing temperature extremes (TXx, TNn), GeoSTANet achieves the highest accuracy (Root Mean Square Error (RMSE) = 1.57$^{\circ}$C, Kling-Gupta Efficiency (KGE) = 0.89, Nash-Sutcliffe Efficiency (NSE) = 0.85, Correlation ($r$) = 0.92), so reducing RMSE by 20% over ConvLSTM. CNN-LSTM and ConvLSTM do well in Continental and Temperate zones; ViT finds fine-scale temperature fluctuations difficult. These results confirm that multi-criteria ranking improves GCM selection for regional climate studies and transformer-based downscaling exceeds conventional deep-learning methods. This framework offers a scalable method to enhance high-resolution climate projections, benefiting impact assessments and adaptation plans.


Attention-Guided Integration of CLIP and SAM for Precise Object Masking in Robotic Manipulation

arXiv.org Artificial Intelligence

Attention-Guided Integration of CLIP and SAM for Precise Object Masking in Robotic Manipulation 1 st Muhammad A. Muttaqien Automation Research T eam National Institute of AIST Tokyo, Japan muha.muttaqien@aist.go.jp 2 nd Tomohiro Motoda Automation Research T eam National Institute of AIST Tokyo, Japan tomohiro.motoda@aist.go.jp 3 rd Ryo Hanai Automation Research T eam National Institute of AIST Tokyo, Japan ryo.hanai@aist.go.jp 4 th Domae Y ukiyasu Automation Research T eam National Institute of AIST Tokyo, Japan domae.yukiyasu@aist.go.jp Abstract --This paper introduces a novel pipeline to enhance the precision of object masking for robotic manipulation within the specific domain of masking products in convenience stores. The approach integrates two advanced AI models, CLIP and SAM, focusing on their synergistic combination and the effective use of multimodal data (image and text). Emphasis is placed on utilizing gradient-based attention mechanisms and customized datasets to fine-tune performance. While CLIP, SAM, and Grad-CAM are established components, their integration within this structured pipeline represents a significant contribution to the field. The resulting segmented masks, generated through this combined approach, can be effectively utilized as inputs for robotic systems, enabling more precise and adaptive object manipulation in the context of convenience store products. I NTRODUCTION In recent years, the ability to recognize and manipulate specific objects within well-defined domains, such as products in convenience stores, has become increasingly important in the field of robotic manipulation [1] [2] [3]. As robots are expected to perform more complex tasks in diverse environments, the need for precise object identification and interaction grows, particularly in domains where a high level of accuracy is crucial. For instance, in convenience stores (Figure 1), robots must reliably identify and handle a wide variety of products, each with unique visual characteristics, to automate tasks such as stocking, sorting, and customer assistance.


LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping

arXiv.org Artificial Intelligence

Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging and contentious. Benchmarking studies on multiple datasets are needed to reveal strengths and limitations of commonly used methods. Existing DSM studies usually rely on a single dataset with restricted access, leading to incomplete and potentially misleading conclusions. To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa). LimeSoDa consists of 31 field- and farm-scale datasets from various countries. Each dataset has three target soil properties: (1) soil organic matter or soil organic carbon, (2) clay content and (3) pH, alongside a set of features. Features are dataset-specific and were obtained by optical spectroscopy, proximal- and remote soil sensing. All datasets were aligned to a tabular format and are ready-to-use for modeling. We demonstrated the use of LimeSoDa for benchmarking by comparing the predictive performance of four learning algorithms across all datasets. This comparison included multiple linear regression (MLR), support vector regression (SVR), categorical boosting (CatBoost) and random forest (RF). The results showed that although no single algorithm was universally superior, certain algorithms performed better in specific contexts. MLR and SVR performed better on high-dimensional spectral datasets, likely due to better compatibility with principal components. In contrast, CatBoost and RF exhibited considerably better performances when applied to datasets with a moderate number (< 20) of features. These benchmarking results illustrate that the performance of a method is highly context-dependent. LimeSoDa therefore provides an important resource for improving the development and evaluation of statistical methods in DSM.


Mapping Trustworthiness in Large Language Models: A Bibliometric Analysis Bridging Theory to Practice

arXiv.org Artificial Intelligence

The rapid proliferation of Large Language Models (LLMs) has raised pressing concerns regarding their trustworthiness, spanning issues of reliability, transparency, fairness, and ethical alignment. Despite the increasing adoption of LLMs across various domains, there remains a lack of consensus on how to operationalize trustworthiness in practice. This study bridges the gap between theoretical discussions and implementation by conducting a bibliometric mapping analysis of 2,006 publications from 2019 to 2025. Through co-authorship networks, keyword co-occurrence analysis, and thematic evolution tracking, we identify key research trends, influential authors, and prevailing definitions of LLM trustworthiness. Additionally, a systematic review of 68 core papers is conducted to examine conceptualizations of trust and their practical implications. Our findings reveal that trustworthiness in LLMs is often framed through existing organizational trust frameworks, emphasizing dimensions such as ability, benevolence, and integrity. However, a significant gap exists in translating these principles into concrete development strategies. To address this, we propose a structured mapping of 20 trust-enhancing techniques across the LLM lifecycle, including retrieval-augmented generation (RAG), explainability techniques, and post-training audits. By synthesizing bibliometric insights with practical strategies, this study contributes towards fostering more transparent, accountable, and ethically aligned LLMs, ensuring their responsible deployment in real-world applications.


LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces

arXiv.org Artificial Intelligence

We introduce the Local Intersectional Visual Spaces (LIVS) dataset, a benchmark for multi-criteria alignment of text-to-image (T2I) models in inclusive urban planning. Developed through a two-year participatory process with 30 community organizations, LIVS encodes diverse spatial preferences across 634 initial concepts, consolidated into six core criteria: Accessibility, Safety, Comfort, Invitingness, Inclusivity, and Diversity, through 37,710 pairwise comparisons. Using Direct Preference Optimization (DPO) to fine-tune Stable Diffusion XL, we observed a measurable increase in alignment with community preferences, though a significant proportion of neutral ratings highlights the complexity of modeling intersectional needs. Additionally, as annotation volume increases, accuracy shifts further toward the DPO-tuned model, suggesting that larger-scale preference data enhances fine-tuning effectiveness. LIVS underscores the necessity of integrating context-specific, stakeholder-driven criteria into generative modeling and provides a resource for evaluating AI alignment methodologies across diverse socio-spatial contexts.