Goto

Collaborating Authors

 South America


Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

arXiv.org Artificial Intelligence

Investigating value alignment in Large Language Models (LLMs) based on cultural context has become a critical area of research. However, similar biases have not been extensively explored in large vision-language models (VLMs). As the scale of multimodal models continues to grow, it becomes increasingly important to assess whether images can serve as reliable proxies for culture and how these values are embedded through the integration of both visual and textual data. In this paper, we conduct a thorough evaluation of multimodal model at different scales, focusing on their alignment with cultural values. Our findings reveal that, much like LLMs, VLMs exhibit sensitivity to cultural values, but their performance in aligning with these values is highly context-dependent. While VLMs show potential in improving value understanding through the use of images, this alignment varies significantly across contexts highlighting the complexities and underexplored challenges in the alignment of multimodal models.


Discovering the influence of personal features in psychological processes using Artificial Intelligence techniques: the case of COVID19 lockdown in Spain

arXiv.org Artificial Intelligence

At the end of 2019, an outbreak of a novel coronavirus was reported in China, leading to the COVID-19 pandemic. In Spain, the first cases were detected in late January 2020, and by mid-March, infections had surpassed 5,000. On March the Spanish government started a nationwide lockdown to contain the spread of the virus. While isolation measures were necessary, they posed significant psychological and socioeconomic challenges, particularly for vulnerable populations. Understanding the psychological impact of lockdown and the factors influencing mental health is crucial for informing future public health policies. This study analyzes the influence of personal, socioeconomic, general health and living condition factors on psychological states during lockdown using AI techniques. A dataset collected through an online questionnaire was processed using two workflows, each structured into three stages. First, individuals were categorized based on psychological assessments, either directly or in combination with unsupervised learning techniques. Second, various Machine Learning classifiers were trained to distinguish between the identified groups. Finally, feature importance analysis was conducted to identify the most influential variables related to different psychological conditions. The evaluated models demonstrated strong performance, with accuracy exceeding 80% and often surpassing 90%, particularly for Random Forest, Decision Trees, and Support Vector Machines. Sensitivity and specificity analyses revealed that models performed well across different psychological conditions, with the health impacts subset showing the highest reliability. For diagnosing vulnerability, models achieved over 90% accuracy, except for less vulnerable individuals using living environment and economic status features, where performance was slightly lower.


UPCMR: A Universal Prompt-guided Model for Random Sampling Cardiac MRI Reconstruction

arXiv.org Artificial Intelligence

Cardiac magnetic resonance imaging (CMR) is vital for diagnosing heart diseases, but long scan time remains a major drawback. To address this, accelerated imaging techniques have been introduced by undersampling k-space, which reduces the quality of the resulting images. Recent deep learning advancements aim to speed up scanning while preserving quality, but adapting to various sampling modes and undersam-pling factors remains challenging. Therefore, building a universal model is a promising direction. In this work, we introduce UPCMR, a universal unrolled model designed for CMR reconstruction. This model incorporates two kinds of learnable prompts, undersampling-specific prompt and spatial-specific prompt, and integrates them with a UNet structure in each block. Overall, by using the CMRxRecon2024 challenge dataset for training and validation, the UPCMR model highly enhances reconstructed image quality across all random sampling scenarios through an effective training strategy compared to some traditional methods, demonstrating strong adaptability potential for this task.


Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning

arXiv.org Artificial Intelligence

Abstract--Large language models (LLMs) face the challenge of hallucinations - outputs that seem coherent but are actually incorrect. A particularly damaging type is fact-conflicting hallucination (FCH), where generated content contradicts established facts. Addressing FCH presents three main challenges: 1) Automatically constructing and maintaining large-scale benchmark datasets is difficult and resource-intensive; 2) Generating complex and efficient test cases that the LLM has not been trained on - especially those involving intricate temporal features - is challenging, yet crucial for eliciting hallucinations; and 3) Validating the reasoning behind LLM outputs is inherently difficult, particularly with complex logical relationships, as it requires transparency in the model's decision-making process. LLMs are tested using these cases through template-based prompts, which require them to generate both answers and reasoning steps. T o validate the reasoning, we propose two semantic-aware oracles that compare the semantic structure of LLM outputs to the ground truths. Key insights reveal that LLMs struggle with out-of-distribution knowledge and logical reasoning. These findings highlight the importance of continued efforts to detect and mitigate hallucinations in LLMs. Large Language Models (LLMs) have revolutionized language processing, demonstrating impressive text generation and comprehension capabilities with diverse applications. However, despite their growing use, LLMs face significant security and privacy challenges [1], [2], [3], [4], [5], which affect their overall effectiveness and reliability . A critical issue is the phenomenon of hallucination, where LLMs generate outputs that are coherent but factually incorrect or irrelevant. This tendency to produce misleading information compromises the safety and usability of LLM-based systems. This paper focuses on fact-conflicting hallucina tion (FCH), the most prominent form of hallucination in LLMs. FCH occurs when LLMs generate content that directly contradicts established facts. For instance, as illustrated in Figure 1, an LLM incorrectly asserts that " Haruki Murakami won the Nobel Prize in Literature in 2016 ", whereas the fact is that "Haruki Murakami has not won the Nobel Prize, though he has received numerous other literary awards ". Such inaccuracies can significantly lead to user confusion and undermine the trust and reliability that are crucial for LLM applications. N. Li, K. Wang, and H. Wang are with Huazhong University of Science and T echnology, China. Song is with the National University of Singapore, Singapore. Li is with the University of New South Wales, Australia.


Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios

arXiv.org Artificial Intelligence

Latent diffusion models have exhibited considerable potential in generative tasks. Watermarking is considered to be an alternative to safeguard the copyright of generative models and prevent their misuse. However, in the context of model distribution scenarios, the accessibility of models to large scale of model users brings new challenges to the security, efficiency and robustness of existing watermark solutions. To address these issues, we propose a secure and efficient watermarking solution. A new security mechanism is designed to prevent watermark leakage and watermark escape, which considers watermark randomness and watermark-model association as two constraints for mandatory watermark injection. To reduce the time cost of training the security module, watermark injection and the security mechanism are decoupled, ensuring that fine-tuning VAE only accomplishes the security mechanism without the burden of learning watermark patterns. A watermark distribution-based verification strategy is proposed to enhance the robustness against diverse attacks in the model distribution scenarios. Experimental results prove that our watermarking consistently outperforms existing six baselines on effectiveness and robustness against ten image processing attacks and adversarial attacks, while enhancing security in the distribution scenarios.


How Expressive are Knowledge Graph Foundation Models?

arXiv.org Artificial Intelligence

Knowledge Graph Foundation Models (KGFMs) are at the frontier for deep learning on knowledge graphs (KGs), as they can generalize to completely novel knowledge graphs with different relational vocabularies. Despite their empirical success, our theoretical understanding of KGFMs remains very limited. In this paper, we conduct a rigorous study of the expressive power of KGFMs. Specifically, we show that the expressive power of KGFMs directly depends on the motifs that are used to learn the relation representations. We then observe that the most typical motifs used in the existing literature are binary, as the representations are learned based on how pairs of relations interact, which limits the model's expressiveness. As part of our study, we design more expressive KGFMs using richer motifs, which necessitate learning relation representations based on, e.g., how triples of relations interact with each other. Finally, we empirically validate our theoretical findings, showing that the use of richer motifs results in better performance on a wide range of datasets drawn from different domains.


Language Models are Few-Shot Graders

arXiv.org Artificial Intelligence

Providing evaluations to student work is a critical component of effective student learning, and automating its process can significantly reduce the workload on human graders. Automatic Short Answer Grading (ASAG) systems, enabled by advancements in Large Language Models (LLMs), offer a promising solution for assessing and providing instant feedback for open-ended student responses. In this paper, we present an ASAG pipeline leveraging state-of-the-art LLMs. Our new LLM-based ASAG pipeline achieves better performances than existing custom-built models on the same datasets. We also compare the grading performance of three OpenAI models: GPT-4, GPT-4o, and o1-preview. Our results demonstrate that GPT-4o achieves the best balance between accuracy and cost-effectiveness. On the other hand, o1-preview, despite higher accuracy, exhibits a larger variance in error that makes it less practical for classroom use. We investigate the effects of incorporating instructor-graded examples into prompts using no examples, random selection, and Retrieval-Augmented Generation (RAG)-based selection strategies. Our findings indicate that providing graded examples enhances grading accuracy, with RAG-based selection outperforming random selection. Additionally, integrating grading rubrics improves accuracy by offering a structured standard for evaluation.


Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare

arXiv.org Artificial Intelligence

We know from prior work that LLMs encode social biases, and that this manifests in clinical tasks. In this work we adopt tools from mechanistic interpretability to unveil sociodemographic representations and biases within LLMs in the context of healthcare. Specifically, we ask: Can we identify activations within LLMs that encode sociodemographic information (e.g., gender, race)? We find that gender information is highly localized in middle MLP layers and can be reliably manipulated at inference time via patching. Such interventions can surgically alter generated clinical vignettes for specific conditions, and also influence downstream clinical predictions which correlate with gender, e.g., patient risk of depression. We find that representation of patient race is somewhat more distributed, but can also be intervened upon, to a degree. To our knowledge, this is the first application of mechanistic interpretability methods to LLMs for healthcare.


Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models

arXiv.org Artificial Intelligence

We study the inherent trade-offs in minimizing privacy risks and maximizing utility, while maintaining high computational efficiency, when fine-tuning large language models (LLMs). A number of recent works in privacy research have attempted to mitigate privacy risks posed by memorizing fine-tuning data by using differentially private training methods (e.g., DP), albeit at a significantly higher computational cost (inefficiency). In parallel, several works in systems research have focussed on developing (parameter) efficient fine-tuning methods (e.g., LoRA), but few works, if any, investigated whether such efficient methods enhance or diminish privacy risks. In this paper, we investigate this gap and arrive at a surprising conclusion: efficient fine-tuning methods like LoRA mitigate privacy risks similar to private fine-tuning methods like DP. Our empirical finding directly contradicts prevailing wisdom that privacy and efficiency objectives are at odds during fine-tuning. Our finding is established by (a) carefully defining measures of privacy and utility that distinguish between memorizing sensitive and non-sensitive tokens in training and test datasets used in fine-tuning and (b) extensive evaluations using multiple open-source language models from Pythia, Gemma, and Llama families and different domain-specific datasets.


Application of Context-dependent Interpretation of Biosignals Recognition to Control a Bionic Multifunctional Hand Prosthesis

arXiv.org Artificial Intelligence

The paper presents an original method for controlling a surface-electromyography-driven (sEMG) prosthesis. A context-dependent recognition system is proposed in which the same class of sEMG signals may have a different interpretation, depending on the context. This allowed the repertoire of performed movements to be increased. The proposed structure of the context-dependent recognition system includes unambiguously defined decision sequences covering the overall action of the prosthesis, i.e. the so-called boxes. Because the boxes are mutually isolated environments, each box has its own interpretation of the recognition result, as well as a separate local-recognition-task-focused classifier. Due to the freedom to assign contextual meanings to classes of biosignals, the construction procedure of the classifier can be optimised in terms of the local classification quality in a given box or the classification quality of the entire system. In the paper, two optimisation problems are formulated, differing in the adopted constraints on optimisation variables, with the methods of solving the problems based on an exhaustive search and an evolutionary algorithm, being developed. Experimental studies were conducted using signals from 1 able-bodied person with simulation of amputation and 10 volunteers with transradial amputations. The study compared the classical recognition system and the context-dependent system for various classifier models. An unusual testing strategy was adopted in the research, taking into account the specificity of the considered recognition task, with two original quality measures resulting from this scheme then being applied. The results obtained confirm the hypothesis that the application of the context-dependent classifier led to an improvement in classification quality.