Goto

Collaborating Authors

 lobe


Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation

Drozdova, Mariia, Lastufka, Erica, Kinakh, Vitaliy, Holotyak, Taras, Schaerer, Daniel, Voloshynovskiy, Slava

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs), such as recent Qwen and Gemini models, are positioned as general-purpose AI systems capable of reasoning across domains. Yet their capabilities in scientific imaging, especially on unfamiliar and potentially previously unseen data distributions, remain poorly understood. In this work, we assess whether generic VLMs, presumed to lack exposure to astronomical corpora, can perform morphology-based classification of radio galaxies using the MiraBest FR-I/FR-II dataset. We explore prompting strategies using natural language and schematic diagrams, and, to the best of our knowledge, we are the first to introduce visual in-context examples within prompts in astronomy. Additionally, we evaluate lightweight supervised adaptation via LoRA fine-tuning. Our findings reveal three trends: (i) even prompt-based approaches can achieve good performance, suggesting that VLMs encode useful priors for unfamiliar scientific domains; (ii) however, outputs are highly unstable, i.e. varying sharply with superficial prompt changes such as layout, ordering, or decoding temperature, even when semantic content is held constant; and (iii) with just 15M trainable parameters and no astronomy-specific pretraining, fine-tuned Qwen-VL achieves near state-of-the-art performance (3% Error rate), rivaling domain-specific models. These results suggest that the apparent "reasoning" of VLMs often reflects prompt sensitivity rather than genuine inference, raising caution for their use in scientific domains. At the same time, with minimal adaptation, generic VLMs can rival specialized models, offering a promising but fragile tool for scientific discovery.


Region-Aware Reconstruction Strategy for Pre-training fMRI Foundation Model

Doodipala, Ruthwik Reddy, Pandey, Pankaj, Rojas, Carolina Torres, Saikia, Manob Jyoti, Sitaram, Ranganatha

arXiv.org Artificial Intelligence

The emergence of foundation models in neuroimaging is driven by the increasing availability of large-scale and heterogeneous brain imaging datasets. Recent advances in self-supervised learning, particularly reconstruction-based objectives, have demonstrated strong potential for pretraining models that generalize effectively across diverse downstream functional MRI (fMRI) tasks. In this study, we explore region-aware reconstruction strategies for a foundation model in resting-state fMRI, moving beyond approaches that rely on random region masking. Specifically, we introduce an ROI-guided masking strategy using the Automated Anatomical Labelling Atlas (AAL3), applied directly to full 4D fMRI volumes to selectively mask semantically coherent brain regions during self-supervised pretraining. Using the ADHD-200 dataset comprising 973 subjects with resting-state fMRI scans, we show that our method achieves a 4.23% improvement in classification accuracy for distinguishing healthy controls from individuals diagnosed with ADHD, compared to conventional random masking. Region-level attribution analysis reveals that brain volumes within the limbic region and cerebellum contribute most significantly to reconstruction fidelity and model representation. Our results demonstrate that masking anatomical regions during model pretraining not only enhances interpretability but also yields more robust and discriminative representations. In future work, we plan to extend this approach by evaluating it on additional neuroimaging datasets, and developing new loss functions explicitly derived from region-aware reconstruction objectives. These directions aim to further improve the robustness and interpretability of foundation models for functional neuroimaging.


Exploring Cross-Client Memorization of Training Data in Large Language Models for Federated Learning

Udsa, Tinnakit, Udomcharoenchaikit, Can, Payoungkhamdee, Patomporn, Nutanong, Sarana, Rattanavipanon, Norrathep

arXiv.org Artificial Intelligence

Federated learning (FL) enables collaborative training without raw data sharing, but still risks training data memorization. Existing FL memorization detection techniques focus on one sample at a time, underestimating more subtle risks of cross-sample memorization. In contrast, recent work on centralized learning (CL) has introduced fine-grained methods to assess memorization across all samples in training data, but these assume centralized access to data and cannot be applied directly to FL. We bridge this gap by proposing a framework that quantifies both intra- and inter-client memorization in FL using fine-grained cross-sample memorization measurement across all clients. Based on this framework, we conduct two studies: (1) measuring subtle memorization across clients and (2) examining key factors that influence memorization, including decoding strategies, prefix length, and FL algorithms. Our findings reveal that FL models do memorize client data, particularly intra-client data, more than inter-client data, with memorization influenced by training and inferencing factors.


MEGS$^{2}$: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

Chen, Jiarui, Chen, Yikeng, Zou, Yingshuang, Huang, Ye, Wang, Peng, Liu, Yuan, Sun, Yujing, Wang, Wenping

arXiv.org Artificial Intelligence

3D Gaussian Splatting (3DGS) has emerged as a dominant novel-view synthesis technique, but its high memory consumption severely limits its applicability on edge devices. A growing number of 3DGS compression methods have been proposed to make 3DGS more efficient, yet most only focus on storage compression and fail to address the critical bottleneck of rendering memory. To address this problem, we introduce MEGS$^{2}$, a novel memory-efficient framework that tackles this challenge by jointly optimizing two key factors: the total primitive number and the parameters per primitive, achieving unprecedented memory compression. Specifically, we replace the memory-intensive spherical harmonics with lightweight, arbitrarily oriented spherical Gaussian lobes as our color representations. More importantly, we propose a unified soft pruning framework that models primitive-number and lobe-number pruning as a single constrained optimization problem. Experiments show that MEGS$^{2}$ achieves a 50% static VRAM reduction and a 40% rendering VRAM reduction compared to existing methods, while maintaining comparable rendering quality. Project page: https://megs-2.github.io/


Unlocking Neural Transparency: Jacobian Maps for Explainable AI in Alzheimer's Detection

Mustafa, Yasmine, Elmahallawy, Mohamed, Luo, Tie

arXiv.org Artificial Intelligence

Alzheimer's disease (AD) leads to progressive cognitive decline, making early detection crucial for effective intervention. While deep learning models have shown high accuracy in AD diagnosis, their lack of interpretability limits clinical trust and adoption. This paper introduces a novel pre-model approach leveraging Jacobian Maps (JMs) within a multi-modal framework to enhance explainability and trustworthiness in AD detection. By capturing localized brain volume changes, JMs establish meaningful correlations between model predictions and well-known neuroanatomical biomarkers of AD. We validate JMs through experiments comparing a 3D CNN trained on JMs versus on traditional preprocessed data, which demonstrates superior accuracy. We also employ 3D Grad-CAM analysis to provide both visual and quantitative insights, further showcasing improved interpretability and diagnostic reliability.


Bronchovascular Tree-Guided Weakly Supervised Learning Method for Pulmonary Segment Segmentation

Zhao, Ruijie, Tan, Zuopeng, Xue, Xiao, Zhao, Longfei, Li, Bing, Liao, Zicheng, Ming, Ying, Wang, Jiaru, Xiao, Ran, Piao, Sirong, Zhao, Rui, Xu, Qiqi, Song, Wei

arXiv.org Artificial Intelligence

Pulmonary segment segmentation is crucial for cancer localization and surgical planning. However, the pixel-wise annotation of pulmonary segments is laborious, as the boundaries between segments are indistinguishable in medical images. To this end, we propose a weakly supervised learning (WSL) method, termed Anatomy-Hierarchy Supervised Learning (AHSL), which consults the precise clinical anatomical definition of pulmonary segments to perform pulmonary segment segmentation. Since pulmonary segments reside within the lobes and are determined by the bronchovascular tree, i.e., artery, airway and vein, the design of the loss function is founded on two principles. First, segment-level labels are utilized to directly supervise the output of the pulmonary segments, ensuring that they accurately encompass the appropriate bronchovascular tree. Second, lobe-level supervision indirectly oversees the pulmonary segment, ensuring their inclusion within the corresponding lobe. Besides, we introduce a two-stage segmentation strategy that incorporates bronchovascular priori information. Furthermore, a consistency loss is proposed to enhance the smoothness of segment boundaries, along with an evaluation metric designed to measure the smoothness of pulmonary segment boundaries. Visual inspection and evaluation metrics from experiments conducted on a private dataset demonstrate the effectiveness of our method.


ReXplain: Translating Radiology into Patient-Friendly Video Reports

Luo, Luyang, Vairavamurthy, Jenanan, Zhang, Xiaoman, Kumar, Abhinav, Ter-Oganesyan, Ramon R., Schroff, Stuart T., Shilo, Dan, Hossain, Rydhwana, Moritz, Mike, Rajpurkar, Pranav

arXiv.org Artificial Intelligence

Radiology reports, designed for efficient communication between medical experts, often remain incomprehensible to patients. This inaccessibility could potentially lead to anxiety, decreased engagement in treatment decisions, and poorer health outcomes, undermining patient-centered care. We present ReXplain (Radiology eXplanation), an innovative AI-driven system that translates radiology findings into patient-friendly video reports. ReXplain uniquely integrates a large language model for medical text simplification and text-anatomy association, an image segmentation model for anatomical region identification, and an avatar generation tool for engaging interface visualization. ReXplain enables producing comprehensive explanations with plain language, highlighted imagery, and 3D organ renderings in the form of video reports. To evaluate the utility of ReXplain-generated explanations, we conducted two rounds of user feedback collection from six board-certified radiologists. The results of this proof-of-concept study indicate that ReXplain could accurately deliver radiological information and effectively simulate one-on-one consultation, shedding light on enhancing patient-centered radiology with potential clinical usage. This work demonstrates a new paradigm in AI-assisted medical communication, potentially improving patient engagement and satisfaction in radiology care, and opens new avenues for research in multimodal medical communication.


PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation

Castro, Daniel C., Bustos, Aurelia, Bannur, Shruthi, Hyland, Stephanie L., Bouzid, Kenza, Wetscherek, Maria Teodora, Sánchez-Valverde, Maria Dolores, Jaques-Pérez, Lara, Pérez-Rodríguez, Lourdes, Takeda, Kenji, Salinas, José María, Alvarez-Valle, Javier, Herrero, Joaquín Galant, Pertusa, Antonio

arXiv.org Artificial Intelligence

Radiology report generation (RRG) aims to create free-text radiology reports from clinical imaging. Grounded radiology report generation (GRRG) extends RRG by including the localisation of individual findings on the image. Currently, there are no manually annotated chest X-ray (CXR) datasets to train GRRG models. In this work, we present a dataset called PadChest-GR (Grounded-Reporting) derived from PadChest aimed at training GRRG models for CXR images. We curate a public bi-lingual dataset of 4,555 CXR studies with grounded reports (3,099 abnormal and 1,456 normal), each containing complete lists of sentences describing individual present (positive) and absent (negative) findings in English and Spanish. In total, PadChest-GR contains 7,037 positive and 3,422 negative finding sentences. Every positive finding sentence is associated with up to two independent sets of bounding boxes labelled by different readers and has categorical labels for finding type, locations, and progression. To the best of our knowledge, PadChest-GR is the first manually curated dataset designed to train GRRG models for understanding and interpreting radiological images and generated text. By including detailed localization and comprehensive annotations of all clinically relevant findings, it provides a valuable resource for developing and evaluating GRRG models from CXR images. PadChest-GR can be downloaded under request from https://bimcv.cipf.es/bimcv-projects/padchest-gr/


Development and Validation of a Dynamic-Template-Constrained Large Language Model for Generating Fully-Structured Radiology Reports

Niu, Chuang, Kaviani, Parisa, Lyu, Qing, Kalra, Mannudeep K., Whitlow, Christopher T., Wang, Ge

arXiv.org Artificial Intelligence

Current LLMs for creating fully-structured reports face the challenges of formatting errors, content hallucinations, and privacy leakage issues when uploading data to external servers.We aim to develop an open-source, accurate LLM for creating fully-structured and standardized LCS reports from varying free-text reports across institutions and demonstrate its utility in automatic statistical analysis and individual lung nodule retrieval. With IRB approvals, our retrospective study included 5,442 de-identified LDCT LCS radiology reports from two institutions. We constructed two evaluation datasets by labeling 500 pairs of free-text and fully-structured radiology reports and one large-scale consecutive dataset from January 2021 to December 2023. Two radiologists created a standardized template for recording 27 lung nodule features on LCS. We designed a dynamic-template-constrained decoding method to enhance existing LLMs for creating fully-structured reports from free-text radiology reports. Using consecutive structured reports, we automated descriptive statistical analyses and a nodule retrieval prototype. Our best LLM for creating fully-structured reports achieved high performance on cross-institutional datasets with an F1 score of about 97%, with neither formatting errors nor content hallucinations. Our method consistently improved the best open-source LLMs by up to 10.42%, and outperformed GPT-4o by 17.19%. The automatically derived statistical distributions were consistent with prior findings regarding attenuation, location, size, stability, and Lung-RADS. The retrieval system with structured reports allowed flexible nodule-level search and complex statistical analysis. Our developed software is publicly available for local deployment and further research.


The Geometry of Concepts: Sparse Autoencoder Feature Structure

Li, Yuxiao, Michaud, Eric J., Baek, David D., Engels, Joshua, Sun, Xiaoqing, Tegmark, Max

arXiv.org Artificial Intelligence

Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: 1) The "atomic" small-scale structure contains "crystals" whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man-woman-king-queen). We find that the quality of such parallelograms and associated function vectors improves greatly when projecting out global distractor directions such as word length, which is efficiently done with linear discriminant analysis. 2) The "brain" intermediate-scale structure has significant spatial modularity; for example, math and code features form a "lobe" akin to functional lobes seen in neural fMRI images. We quantify the spatial locality of these lobes with multiple metrics and find that clusters of co-occurring features, at coarse enough scale, also cluster together spatially far more than one would expect if feature geometry were random. 3) The "galaxy" scale large-scale structure of the feature point cloud is not isotropic, but instead has a power law of eigenvalues with steepest slope in middle layers. We also quantify how the clustering entropy depends on the layer.