Goto

Collaborating Authors

 lat






MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction

Zheng, Shunjie-Fabian, Lee, Hyeonjun, Kooi, Thijs, Diba, Ali

arXiv.org Artificial Intelligence

Large annotated datasets are essential for training robust Computer-Aided Diagnosis (CAD) models for breast cancer detection or risk prediction. However, acquiring such datasets with fine-detailed annotation is both costly and time-consuming. Vision-Language Models (VLMs), such as CLIP, which are pre-trained on large image-text pairs, offer a promising solution by enhancing robustness and data efficiency in medical imaging tasks. This paper introduces a novel Multi-View Mammography and Language Model for breast cancer classification and risk prediction, trained on a dataset of paired mammogram images and synthetic radiology reports. Our MV-MLM leverages multi-view supervision to learn rich representations from extensive radiology data by employing cross-modal self-supervision across image-text pairs. This includes multiple views and the corresponding pseudo-radiology reports. W e propose a novel joint visual-textual learning strategy to enhance generalization and accuracy performance over different data types and tasks to distinguish breast tissues or cancer characteristics(calcification, mass) and utilize these patterns to understand mammography images and predict cancer risk. W e evaluated our method on both private and publicly available datasets, demonstrating that the proposed model achieves state-of-the-art performance in three classification tasks: (1) malignancy classification, (2) subtype classification, and (3) image-based cancer risk prediction. Furthermore, the model exhibits strong data efficiency, outperforming existing fully supervised or VLM baselines while trained on synthetic text reports and without the need for actual radiology reports.



LLMs on a Budget? Say HOLA

Siddiqui, Zohaib Hasan, Gao, Jiechao, Shabbir, Ebad, Azeez, Mohammad Anas, Ali, Rafiq, Kashyap, Gautam Siddharth, Naseem, Usman

arXiv.org Artificial Intelligence

Running Large Language Models (LLMs) on edge devices is constrained by high compute and memory demands posing a barrier for real-time applications in sectors like healthcare, education, and embedded systems. Current solutions such as quantization, pruning, and retrieval-augmented generation (RAG) offer only partial optimizations and often compromise on speed or accuracy. We introduce HOLA, an end-to-end optimization framework for efficient LLM deployment. Internally, it leverages Hierarchical Speculative Decoding (HSD) for faster inference without quality loss. Externally, AdaComp-RAG adjusts retrieval complexity based on context needs. Together with LoBi, which blends structured pruning (LoRA) and quantization, HOLA delivers significant gains: 17.6% EMA on GSM8K, 10.5% MCA on ARC, and reduced latency and memory on edge devices like Jetson Nano--proving both scalable and production-ready.



remarks, and improved experimental results on CIFAR10-binary, finding a model with 76.83% accuracy and WM2 2KB and a model with 74.87% accuracy and WM,MS2KB, both of which outperform Bonsai

Neural Information Processing Systems

We thank the reviewers for their valuable feedback. This rebuttal includes further experiments to address the reviewers' These ablation results support the design choices made in SpArSe in the context of memory constrained MCUs. On MNIST, SpArSe achieves accuracy of 99.17% with 1.45e3 parameters, compared to 99.15% accuracy SpArSe would not work with the design choices made in previous NAS works, especially [23]. Reproducability (R1) We are happy to make the implementation publicly available upon acceptance. We argue that: 1) SpArSe addresses a significant gap in the community, i.e. model design for V alidity of claim on line 66 (R1) Our claim is true for WM 2KB, but we will revise that sentence for clarity.