model variant
A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler
Zhang, Wenxuan, Li, Shuai, Wang, Xinyi, Sun, Yu, Kang, Hongyu, Wan, Pui Yuk Chryste, Qin, Jing, Zhang, Yuanpeng, Zheng, Yong-Ping, Lam, Sai-Kit
The Circle of Willis (CoW), vital for ensuring consistent blood flow to the brain, is closely linked to ischemic stroke. Accurate assessment of the CoW is important for identifying individuals at risk and guiding appropriate clinical management. Among existing imaging methods, Transcranial Color-coded Doppler (TCCD) offers unique advantages due to its radiation-free nature, affordability, and accessibility. However, reliable TCCD assessments depend heavily on operator expertise for identifying anatomical landmarks and performing accurate angle correction, which limits its widespread adoption. To address this challenge, we propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries. No prior studies have explored AI-driven cerebrovascular segmentation using TCCD. In this work, we introduce a novel Attention-Augmented Wavelet YOLO (AAW-YOLO) network tailored for TCCD data, designed to provide real-time guidance for brain vessel segmentation in the CoW. We prospectively collected TCCD data comprising 738 annotated frames and 3,419 labeled artery instances to establish a high-quality dataset for model training and evaluation. The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels, achieving an average Dice score of 0.901, IoU of 0.823, precision of 0.882, recall of 0.926, and mAP of 0.953, with a per-frame inference speed of 14.199 ms. This system offers a practical solution to reduce reliance on operator experience in TCCD-based cerebrovascular screening, with potential applications in routine clinical workflows and resource-constrained settings. Future research will explore bilateral modeling and larger-scale validation.
- Asia > China > Hong Kong (0.05)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Hematology (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices
Liu, Mengyang, Lu, Chenyu, Tian, Haodong, Dong, Fang, Zhou, Ruiting, Wang, Wei, Shen, Dian, Li, Guangtong, Wan, Ye, Li, Li
On-device machine learning (ML) has become a fundamental component of emerging mobile applications. Adaptive model deployment delivers efficient inference for heterogeneous device capabilities and performance requirements through customizing neural architectures. SuperNet-based approaches offer a promising solution by generating a large number of model variants from a pre-trained ML model. However, applying SuperNet in existing frameworks suffers from tedious model-aware development and time-consuming hardware-aware profiling, which limits their practical adoption. We present AutoTailor, the first framework to enable automated, end-to-end SuperNet-based adaptive model deployment for edge devices. Unlike manual SuperNet construction, AutoTailor employs a computation graph-guided compilation approach to automatically transform user-provided ML models into SuperNets. To support efficient specialization, AutoTailor incorporates learning-free latency and accuracy predictors, enabling low-cost yet accurate performance prediction. Our extended evaluations demonstrate that AutoTailor reduces the lines of code for SuperNet construction by 11--27$\times$, decreases hardware-aware profiling costs by at least 11$\times$, and achieves up to 15.60\% absolute accuracy improvement and 60.03\% latency reduction compared to state-of-the-art approaches across diverse models and devices.
- North America > United States > Massachusetts > Suffolk County > Boston (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- (28 more...)
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
Sainz, Oscar, Perez, Naiara, Etxaniz, Julen, de Landa, Joseba Fernandez, Aldabe, Itziar, García-Ferrero, Iker, Zabala, Aimar, Azurmendi, Ekhi, Rigau, German, Agirre, Eneko, Artetxe, Mikel, Soroa, Aitor
Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to conventional instruction adaptation pipelines in low-resource scenarios. We assume a realistic scenario for low-resource languages, where only the following are available: corpora in the target language, existing open-weight multilingual base and instructed backbone LLMs, and synthetically generated instructions sampled from the instructed backbone. We present a comprehensive set of experiments for Basque that systematically study different combinations of these components evaluated on benchmarks and human preferences from 1,680 participants. Our conclusions show that target language corpora are essential, with synthetic instructions yielding robust models, and, most importantly, that using as backbone an instruction-tuned model outperforms using a base non-instructed model. Scaling up to Llama 3.1 Instruct 70B as backbone, our model comes near frontier models of much larger sizes for Basque, without using any Basque instructions. We release code, models, instruction datasets, and human preferences to support full reproducibility in future research on low-resource language adaptation. https://github.com/hitz-zentroa/latxa-instruct
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (13 more...)
Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints
Wang, Yueyang, Dogar, Mehmet, Markkula, Gustav
Modelling pedestrian-driver interactions is critical for understanding human road user behaviour and developing safe autonomous vehicle systems. Existing approaches often rely on rule-based logic, game-theoretic models, or 'black-box' machine learning methods. However, these models typically lack flexibility or overlook the underlying mechanisms, such as sensory and motor constraints, which shape how pedestrians and drivers perceive and act in interactive scenarios. In this study, we propose a multi-agent reinforcement learning (RL) framework that integrates both visual and motor constraints of pedestrian and driver agents. Using a real-world dataset from an unsignalised pedestrian crossing, we evaluate four model variants, one without constraints, two with either motor or visual constraints, and one with both, across behavioural metrics of interaction realism. Results show that the combined model with both visual and motor constraints performs best. Motor constraints lead to smoother movements that resemble human speed adjustments during crossing interactions. The addition of visual constraints introduces perceptual uncertainty and field-of-view limitations, leading the agents to exhibit more cautious and variable behaviour, such as less abrupt deceleration. In this data-limited setting, our model outperforms a supervised behavioural cloning model, demonstrating that our approach can be effective without large training datasets. Finally, our framework accounts for individual differences by modelling parameters controlling the human constraints as population-level distributions, a perspective that has not been explored in previous work on pedestrian-vehicle interaction modelling. Overall, our work demonstrates that multi-agent RL with human constraints is a promising modelling approach for simulating realistic road user interactions.
- Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Therapeutic Area > Neurology (0.67)
- Transportation > Ground > Road (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
Bian, Song, Yu, Tao, Venkataraman, Shivaram, Park, Youngsuk
Scaling the number of parameters and the size of training data has proven to be an effective strategy for improving large language model (LLM) performance. Y et, as these models grow increasingly powerful and widely deployed, the cost of inference has become a pressing concern. Despite its importance, the tradeoff between model accuracy and inference efficiency remains underexplored. In this work, we examine how key architectural factors, hidden size, the allocation of parameters between MLP and attention (mlp-to-attention ratio), and grouped-query attention (GQA), influence both inference cost and accuracy. We introduce a conditional scaling law that augments the Chinchilla framework with architectural information, along with a search framework for identifying architectures that are simultaneously inference-efficient and accurate. To validate our approach, we train more than 200 models spanning 80M to 3B parameters and 8B to 100B training tokens, and fit the proposed conditional scaling law. Our results show that the conditional scaling law reliably predicts optimal architectural choices and that the resulting models outperform existing open-source baselines. Under the same training budget, optimized architectures achieve up to 2.1% higher accuracy and 42% greater inference throughput compared to LLaMA-3.2.
- Asia > Middle East > Jordan (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
Romero-Díaz, Jacobo, Gállego, Gerard I., Pareras, Oriol, Costa, Federico, Hernando, Javier, España-Bonet, Cristina
Speech-to-Text Translation (S2TT) systems built from Automatic Speech Recognition (ASR) and Text-to-Text Translation (T2TT) modules face two major limitations: error propagation and the inability to exploit prosodic or other acoustic cues. Chain-of-Thought (CoT) prompting has recently been introduced, with the expectation that jointly accessing speech and transcription will overcome these issues. Analyzing CoT through attribution methods, robustness evaluations with corrupted transcripts, and prosody-awareness, we find that it largely mirrors cascaded behavior, relying mainly on transcripts while barely leveraging speech. Simple training interventions, such as adding Direct S2TT data or noisy transcript injection, enhance robustness and increase speech attribution. These findings challenge the assumed advantages of CoT and highlight the need for architectures that explicitly integrate acoustic information into translation.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (4 more...)
The Brain Uses Reliability of Stimulus Information when Making Perceptual Decisions
Sebastian Bitzer, Stefan Kiebel
In simple perceptual decisions the brain has to identify a stimulus based on noisy sensory samples from the stimulus. Basic statistical considerations state that the reliability of the stimulus information, i.e., the amount of noise in the samples, should be taken into account when the decision is made. However, for perceptual decision making experiments it has been questioned whether the brain indeed uses the reliability for making decisions when confronted with unpredictable changes in stimulus reliability. We here show that even the basic drift diffusion model, which has frequently been used to explain experimental findings in perceptual decision making, implicitly relies on estimates of stimulus reliability. We then show that only those variants of the drift diffusion model which allow stimulus-specific reliabilities are consistent with neurophysiological findings. Our analysis suggests that the brain estimates the reliability of the stimulus on a short time scale of at most a few hundred milliseconds.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Germany > Saxony > Dresden (0.04)
The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization
Acceptance and Commitment Therapy (ACT) is a third-wave cognitive behavioral therapy with emerging evidence of efficacy in several psychiatric conditions. This study investigates the impact of post-training methodology and explicit reasoning on the ability of a small open-weight large language model (LLM) to deliver ACT. Using synthetic ACT transcripts generated by Mistral-Large, we trained Llama-3.2-3b-Instruct with two distinct approaches, supervised fine-tuning (SFT) and odds ratio policy optimization (ORPO), each with and without an explicit chain-of-thought (COT) reasoning step. Performance was evaluated by comparing these four post-trained variants against the base Instruct model. These models were benchmarked in simulated therapy sessions, with performance quantitatively assessed on the ACT Fidelity Measure (ACT-FM) and the Therapist Empathy Scale (TES) by an LLM judge that had been fine-tuned on human evaluations. Our findings demonstrate that the ORPO-trained models significantly outperformed both their SFT and Instruct counterparts on ACT fidelity ($χ^2(5) = 185.15, p < .001$) and therapeutic empathy ($χ^2(5) = 140.37, p < .001$). The effect of COT was conditional as it provided a significant benefit to SFT models, improving ACT-FM scores by an average of 2.68 points ($p < .001$), while offering no discernible advantage to the superior ORPO or instruct-tuned variants. We posit that the superiority of ORPO stems from its ability to learn the therapeutic `process' over imitating `content,' a key aspect of ACT, while COT acts as a necessary scaffold for models trained only via imitation. This study establishes that preference-aligned policy optimization can effectively instill ACT competencies in small LLMs, and that the utility of explicit reasoning is highly dependent on the underlying training paradigm.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Comparing Conditional Diffusion Models for Synthesizing Contrast-Enhanced Breast MRI from Pre-Contrast Images
Ibarra, Sebastian, del Riego, Javier, Catanese, Alessandro, Cuba, Julian, Cardona, Julian, Leon, Nataly, Infante, Jonathan, Lekadir, Karim, Diaz, Oliver, Osuala, Richard
Dynamic contrast-enhanced (DCE) MRI is essential for breast cancer diagnosis and treatment. However, its reliance on contrast agents introduces safety concerns, contraindications, increased cost, and workflow complexity. To this end, we present pre-contrast conditioned denoising diffusion probabilistic models to synthesize DCE-MRI, introducing, evaluating, and comparing a total of 22 generative model variants in both single-breast and full breast settings. Towards enhancing lesion fidelity, we introduce both tumor-aware loss functions and explicit tumor segmentation mask conditioning. Using a public multicenter dataset and comparing to respective pre-contrast baselines, we observe that subtraction image-based models consistently outperform post-contrast-based models across five complementary evaluation metrics. Apart from assessing the entire image, we also separately evaluate the region of interest, where both tumor-aware losses and segmentation mask inputs improve evaluation metrics. The latter notably enhance qualitative results capturing contrast uptake, albeit assuming access to tumor localization inputs that are not guaranteed to be available in screening settings. A reader study involving 2 radiologists and 4 MRI technologists confirms the high realism of the synthetic images, indicating an emerging clinical potential of generative contrast-enhancement. We share our codebase at https://github.com/sebastibar/conditional-diffusion-breast-MRI.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > Cuba (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.35)
RT-VLM: Re-Thinking Vision Language Model with 4-Clues for Real-World Object Recognition Robustness
Park, Junghyun, Nguyen, Tuan Anh, Min, Dugki
Real world deployments often expose modern object recognition models to domain shifts that precipitate a severe drop in accuracy. Such shifts encompass (i) variations in low level image statistics, (ii) changes in object pose and viewpoint, (iii) partial occlusion, and (iv) visual confusion across adjacent classes. To mitigate this degradation, we introduce the Re-Thinking Vision Language Model (RT -VLM) framework. The foundation of this framework is a unique synthetic dataset generation pipeline that produces images annotated with "4-Clues": precise bounding boxes, class names, detailed object-level captions, and a comprehensive context-level caption for the entire scene. We then perform parameter efficient supervised tuning of Llama 3.2 11B Vision Instruct on this resource. At inference time, a two stage Re-Thinking scheme is executed: the model first emits its own four clues, then re examines these responses as evidence and iteratively corrects them. Across robustness benchmarks that isolate individual domain shifts, RT -VLM consistently surpasses strong baselines. These findings indicate that the integration of structured multimodal evidence with an explicit self critique loop constitutes a promising route toward reliable and transferable visual understanding.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- North America > United States > Texas > Taylor County (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)