Goto

Collaborating Authors

 South America


AI-Powered Automated Model Construction for Patient-Specific CFD Simulations of Aortic Flows

arXiv.org Artificial Intelligence

Effectively understanding and managing CVD requires advanced diagnostic tools capable of accurately characterizing complex hemodynamics within the cardiovascular system. While medical imaging modalities such as computed tomography (CT) and magnetic resonance imaging (MRI) provide high-resolution anatomical detail, they lack the capability to directly capture hemodynamics information (e.g., blood flow patterns, pressure, and wall shear stress fields) critical for understanding vascular function and pathology. To bridge this gap, image-based computational fluid dynamics (CFD) has emerged as a powerful computational paradigm that derives hemodynamic information from anatomical images via conservation laws. Although widely utilized in cardiovascular research, the clinical application of image-based CFD for diagnosis and surgical planning remains limited, largely due to the challenges associated with efficient and accurate model construction [2-4]. Constructing patient-specific vascular models for image-based CFD involves multiple steps, including image segmentation, geometry modeling, and mesh generation for the computational domain, all of which are critical to ensuring the fidelity of the final simulation results. However, the standard workflow heavily relies on manual methods, making it highly labor-intensive and time-consuming.


CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences

arXiv.org Artificial Intelligence

Large language models (LLMs) excel at processing long sequences, boosting demand for key-value (KV) caching. While recent efforts to evict KV cache have alleviated the inference burden, they often fail to allocate resources rationally across layers with different attention patterns. In this paper, we introduce Cascading and Adaptive KV cache Eviction (CAKE), a novel approach that frames KV cache eviction as a "cake-slicing problem." CAKE assesses layer-specific preferences by considering attention dynamics in both spatial and temporal dimensions, allocates rational cache size for layers accordingly, and manages memory constraints in a cascading manner. This approach enables a global view of cache allocation, adaptively distributing resources across diverse attention mechanisms while maintaining memory budgets. CAKE also employs a new eviction indicator that considers the shifting importance of tokens over time, addressing limitations in existing methods that overlook temporal dynamics. Comprehensive experiments on LongBench and NeedleBench show that CAKE maintains model performance with only 3.2% of the KV cache and consistently outperforms current baselines across various models and memory constraints, particularly in low-memory settings. Additionally, CAKE achieves over 10 speedup in decoding latency compared to full cache when processing contexts of 128K tokens with FlashAttention-2. New models such as GPT-4 (Achiam et al., 2023), Claude 3.5 (Anthropic, 2024), LLaMA 3.1 (Dubey et al., 2024) and Mistral Large 2 (AI, 2024) have extended token processing capacities beyond 128K. Shazeer (2019); Ainslie et al. (2023) partially address this issue by merging key-value heads during the training phase. However, optimizing key-value cache without additional training is crucial for efficient inference of long contexts under memory constraints, particularly in typical deployment scenarios where the model structure is fixed. One way to maintain a manageable KV cache size on the fly is to remove some KV pairs (Xiao et al., 2023; Zhang et al., 2024b; Li et al., 2024b). The idea is to eliminate less important KV pairs based on certain rules. Although recent methods have enhanced pair selection for removal, they typically assign uniform cache sizes across layers, disregarding layer-specific requirements.


LLMs' Leaning in European Elections

arXiv.org Artificial Intelligence

The analysis of LLM biases is an active research field. As the application of LLMs in decision-making activities is increasing, their study is critical to better understand their implications on decisional processes. The coherence and the structural preferences that these models are acquiring over several topics could challenge their applications in several fields [4]. The origin of these biases is complicated to study and could come from different steps in LLM training. For example, these biases could be acquired during the pre-training phase, supervised fine-tuning phase, or even during the final alignment phase. This article focuses on understanding the extent of the political biases of LLM through two experiments. The first experiment has the objective of showing the left leaning of multiple LLMs in the context of several virtual European elections, section 4.1. The second experiment shows that LLMs consider "stupidity" and "ignorance" as human characteristics that make voting for the right wing more probable, section 4.2. As different models could exhibit different leans, we tested four of the most used LLMs in both our experiments, Table 1.


Integrating mobile and fixed monitoring data for high-resolution PM2.5 mapping using machine learning

arXiv.org Artificial Intelligence

Constructing high resolution air pollution maps at lower cost is crucial for sustainable city management and public health risk assessment. However, traditional fixed-site monitoring lacks spatial coverage, while mobile low-cost sensors exhibit significant data instability. This study integrates PM2.5 data from 320 taxi-mounted mobile low-cost sensors and 52 fixed monitoring stations to address these limitations. By employing the machine learning methods, an appropriate mapping relationship was established between fixed and mobile monitoring concentration. The resulting pollution maps achieved 500-meter spatial and 5-minute temporal resolutions, showing close alignment with fixed monitoring data (+4.35% bias) but significant deviation from raw mobile data (-31.77%). The fused map exhibits the fine-scale spatial variability also observed in the mobile pollution map, while showing the stable temporal variability closer to that of the fixed pollution map (fixed: 1.12 plus or minus 0.73%, mobile: 3.15 plus or minus 2.44%, mapped: 1.01 plus or minus 0.65%). These findings demonstrate the potential of large-scale mobile low-cost sensor networks for high-resolution air quality mapping, supporting targeted urban environmental governance and health risk mitigation.


RL-TIME: Reinforcement Learning-based Task Replication in Multicore Embedded Systems

arXiv.org Artificial Intelligence

Embedded systems power many modern applications and must often meet strict reliability, real-time, thermal, and power requirements. Task replication can improve reliability by duplicating a task's execution to handle transient and permanent faults, but blindly applying replication often leads to excessive overhead and higher temperatures. Existing design-time methods typically choose the number of replicas based on worst-case conditions, which can waste resources under normal operation. In this paper, we present RL-TIME, a reinforcement learning-based approach that dynamically decides the number of replicas according to actual system conditions. By considering both the reliability target and a core-level Thermal Safe Power (TSP) constraint at run-time, RL-TIME adapts the replication strategy to avoid unnecessary overhead and overheating. Experimental results show that, compared to state-of-the-art methods, RL-TIME reduces power consumption by 63%, increases schedulability by 53%, and respects TSP 72% more often.


HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets

arXiv.org Artificial Intelligence

Cross-dataset Human Activity Recognition (HAR) suffers from limited model generalization, hindering its practical deployment. To address this critical challenge, inspired by the success of DoReMi in Large Language Models (LLMs), we introduce a data mixture optimization strategy for pre-training HAR models, aiming to improve the recognition performance across heterogeneous datasets. However, directly applying DoReMi to the HAR field encounters new challenges due to the continuous, multi-channel and intrinsic heterogeneous characteristics of IMU sensor data. To overcome these limitations, we propose a novel framework HAR-DoReMi, which introduces a masked reconstruction task based on Mean Squared Error (MSE) loss. By raplacing the discrete language sequence prediction task, which relies on the Negative Log-Likelihood (NLL) loss, in the original DoReMi framework, the proposed framework is inherently more appropriate for handling the continuous and multi-channel characteristics of IMU data. In addition, HAR-DoReMi integrates the Mahony fusion algorithm into the self-supervised HAR pre-training, aiming to mitigate the heterogeneity of varying sensor orientation. This is achieved by estimating the sensor orientation within each dataset and facilitating alignment with a unified coordinate system, thereby improving the cross-dataset generalization ability of the HAR model. Experimental evaluation on multiple cross-dataset HAR transfer tasks demonstrates that HAR-DoReMi improves the accuracy by an average of 6.51%, compared to the current state-of-the-art method with only approximately 30% to 50% of the data usage. These results confirm the effectiveness of HAR-DoReMi in improving the generalization and data efficiency of pre-training HAR models, underscoring its significant potential to facilitate the practical deployment of HAR technology.


ASD Classification on Dynamic Brain Connectome using Temporal Random Walk with Transformer-based Dynamic Network Embedding

arXiv.org Artificial Intelligence

Autism Spectrum Disorder (ASD) is a complex neurological condition characterized by varied developmental impairments, especially in communication and social interaction. Accurate and early diagnosis of ASD is crucial for effective intervention, which is enhanced by richer representations of brain activity. The brain functional connectome, which refers to the statistical relationships between different brain regions measured through neuroimaging, provides crucial insights into brain function. Traditional static methods often fail to capture the dynamic nature of brain activity, in contrast, dynamic brain connectome analysis provides a more comprehensive view by capturing the temporal variations in the brain. We propose BrainTWT, a novel dynamic network embedding approach that captures temporal evolution of the brain connectivity over time and considers also the dynamics between different temporal network snapshots. BrainTWT employs temporal random walks to capture dynamics across different temporal network snapshots and leverages the Transformer's ability to model long term dependencies in sequential data to learn the discriminative embeddings from these temporal sequences using temporal structure prediction tasks. The experimental evaluation, utilizing the Autism Brain Imaging Data Exchange (ABIDE) dataset, demonstrates that BrainTWT outperforms baseline methods in ASD classification.


SAM2 for Image and Video Segmentation: A Comprehensive Survey

arXiv.org Artificial Intelligence

Despite significant advances in deep learning for image and video segmentation, existing models continue to face challenges in cross-domain adaptability and generalization. Image and video segmentation are fundamental tasks in computer vision with wide-ranging applications in healthcare, agriculture, industrial inspection, and autonomous driving. With the advent of large-scale foundation models, SAM2 - an improved version of SAM (Segment Anything Model)has been optimized for segmentation tasks, demonstrating enhanced performance in complex scenarios. However, SAM2's adaptability and limitations in specific domains require further investigation. This paper systematically analyzes the application of SAM2 in image and video segmentation and evaluates its performance in various fields. We begin by introducing the foundational concepts of image segmentation, categorizing foundation models, and exploring the technical characteristics of SAM and SAM2. Subsequently, we delve into SAM2's applications in static image and video segmentation, emphasizing its performance in specialized areas such as medical imaging and the challenges of cross-domain adaptability. As part of our research, we reviewed over 200 related papers to provide a comprehensive analysis of the topic. Finally, the paper highlights the strengths and weaknesses of SAM2 in segmentation tasks, identifies the technical challenges it faces, and proposes future development directions. This review provides valuable insights and practical recommendations for optimizing and applying SAM2 in real-world scenarios.


Adaptive Deep Learning for Multiclass Breast Cancer Classification via Misprediction Risk Analysis

arXiv.org Artificial Intelligence

Breast cancer remains one of the leading causes of cancer-related deaths worldwide. Early detection is crucial for improving patient outcomes, yet the diagnostic process is often complex and prone to inconsistencies among pathologists. Computer-aided diagnostic approaches have significantly enhanced breast cancer detection, particularly in binary classification (benign vs. malignant). However, these methods face challenges in multiclass classification, leading to frequent mispredictions. In this work, we propose a novel adaptive learning approach for multiclass breast cancer classification using H&E-stained histopathology images. First, we introduce a misprediction risk analysis framework that quantifies and ranks the likelihood of an image being mislabeled by a classifier. This framework leverages an interpretable risk model that requires only a small number of labeled samples for training. Next, we present an adaptive learning strategy that fine-tunes classifiers based on the specific characteristics of a given dataset. This approach minimizes misprediction risk, allowing the classifier to adapt effectively to the target workload. We evaluate our proposed solutions on real benchmark datasets, demonstrating that our risk analysis framework more accurately identifies mispredictions compared to existing methods. Furthermore, our adaptive learning approach significantly improves the performance of state-of-the-art deep neural network classifiers.


cantnlp@DravidianLangTech2025: A Bag-of-Sounds Approach to Multimodal Hate Speech Detection

arXiv.org Artificial Intelligence

This paper presents the systems and results for the Multimodal Social Media Data Analysis in Dravidian Languages (MSMDA-DL) shared task at the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (DravidianLangTech-2025). We took a `bag-of-sounds' approach by training our hate speech detection system on the speech (audio) data using transformed Mel spectrogram measures. While our candidate model performed poorly on the test set, our approach offered promising results during training and development for Malayalam and Tamil. With sufficient and well-balanced training data, our results show that it is feasible to use both text and speech (audio) data in the development of multimodal hate speech detection systems.