Goto

Collaborating Authors

 Expert Systems


How to Merge Your Multimodal Models Over Time?

arXiv.org Artificial Intelligence

Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches assume that all experts are available simultaneously. In reality, new tasks and domains emerge progressively over time, requiring strategies to integrate the knowledge of expert models as they become available: a process we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work, raising new questions such as: when training for a new task, should the expert model start from the merged past experts or from the original base model? Should we merge all models at each time step? Which merging techniques are best suited for temporal merging? Should different strategies be used to initialize the training and deploy the model? To answer these questions, we propose a unified framework called TIME - Temporal Integration of Model Expertise - which defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Using TIME, we study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark. Our comprehensive suite of experiments across TIME allows us to uncover key insights for temporal model merging, offering a better understanding of current challenges and best practices for effective temporal model merging.


A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

arXiv.org Artificial Intelligence

This survey and application guide to multimodal large language models(MLLMs) explores the rapidly developing field of MLLMs, examining their architectures, applications, and impact on AI and Generative Models. Starting with foundational concepts, we delve into how MLLMs integrate various data types, including text, images, video and audio, to enable complex AI systems for cross-modal understanding and generation. It covers essential topics such as training methods, architectural components, and practical applications in various fields, from visual storytelling to enhanced accessibility. Through detailed case studies and technical analysis, the text examines prominent MLLM implementations while addressing key challenges in scalability, robustness, and cross-modal learning. Concluding with a discussion of ethical considerations, responsible AI development, and future directions, this authoritative resource provides both theoretical frameworks and practical insights. It offers a balanced perspective on the opportunities and challenges in the development and deployment of MLLMs, and is highly valuable for researchers, practitioners, and students interested in the intersection of natural language processing and computer vision.


Path-based summary explanations for graph recommenders (extended version)

arXiv.org Artificial Intelligence

Path-based explanations provide intrinsic insights into graph-based recommendation models. However, most previous work has focused on explaining an individual recommendation of an item to a user. In this paper, we propose summary explanations, i.e., explanations that highlight why a user or a group of users receive a set of item recommendations and why an item, or a group of items, is recommended to a set of users as an effective means to provide insights into the collective behavior of the recommender. We also present a novel method to summarize explanations using efficient graph algorithms, specifically the Steiner Tree and the Prize-Collecting Steiner Tree. Our approach reduces the size and complexity of summary explanations while preserving essential information, making explanations more comprehensible for users and more useful to model developers. Evaluations across multiple metrics demonstrate that our summaries outperform baseline explanation methods in most scenarios, in a variety of quality aspects.


eXpath: Explaining Knowledge Graph Link Prediction with Ontological Closed Path Rules

arXiv.org Artificial Intelligence

Link prediction (LP) is crucial for Knowledge Graphs (KG) completion but commonly suffers from interpretability issues. While several methods have been proposed to explain embedding-based LP models, they are generally limited to local explanations on KG and are deficient in providing human interpretable semantics. Based on real-world observations of the characteristics of KGs from multiple domains, we propose to explain LP models in KG with path-based explanations. An integrated framework, namely eXpath, is introduced which incorporates the concept of relation path with ontological closed path rules to enhance both the efficiency and effectiveness of LP interpretation. Notably, the eXpath explanations can be fused with other single-link explanation approaches to achieve a better overall solution. Extensive experiments across benchmark datasets and LP models demonstrate that introducing eXpath can boost the quality of resulting explanations by about 20% on two key metrics and reduce the required explanation time by 61.4%, in comparison to the best existing method. Case studies further highlight eXpath's ability to provide more semantically meaningful explanations through path-based evidence.


Prompt Transfer for Dual-Aspect Cross Domain Cognitive Diagnosis

arXiv.org Artificial Intelligence

Cognitive Diagnosis (CD) aims to evaluate students' cognitive states based on their interaction data, enabling downstream applications such as exercise recommendation and personalized learning guidance. However, existing methods often struggle with accuracy drops in cross-domain cognitive diagnosis (CDCD), a practical yet challenging task. While some efforts have explored exercise-aspect CDCD, such as crosssubject scenarios, they fail to address the broader dual-aspect nature of CDCD, encompassing both student- and exerciseaspect variations. This diversity creates significant challenges in developing a scenario-agnostic framework. To address these gaps, we propose PromptCD, a simple yet effective framework that leverages soft prompt transfer for cognitive diagnosis. PromptCD is designed to adapt seamlessly across diverse CDCD scenarios, introducing PromptCD-S for student-aspect CDCD and PromptCD-E for exercise-aspect CDCD. Extensive experiments on real-world datasets demonstrate the robustness and effectiveness of PromptCD, consistently achieving superior performance across various CDCD scenarios. Our work offers a unified and generalizable approach to CDCD, advancing both theoretical and practical understanding in this critical domain. The implementation of our framework is publicly available at https://github.com/Publisher-PromptCD/PromptCD.


Learning Semantic Association Rules from Internet of Things Data

arXiv.org Artificial Intelligence

Association Rule Mining (ARM) is the task of discovering commonalities in data in the form of logical implications. ARM is used in the Internet of Things (IoT) for different tasks including monitoring and decision-making. However, existing methods give limited consideration to IoT-specific requirements such as heterogeneity and volume. Furthermore, they do not utilize important static domain-specific description data about IoT systems, which is increasingly represented as knowledge graphs. In this paper, we propose a novel ARM pipeline for IoT data that utilizes both dynamic sensor data and static IoT system metadata. Furthermore, we propose an Autoencoder-based Neurosymbolic ARM method (Aerial) as part of the pipeline to address the high volume of IoT data and reduce the total number of rules that are resource-intensive to process. Aerial learns a neural representation of a given data and extracts association rules from this representation by exploiting the reconstruction (decoding) mechanism of an autoencoder. Extensive evaluations on 3 IoT datasets from 2 domains show that ARM on both static and dynamic IoT data results in more generically applicable rules while Aerial can learn a more concise set of high-quality association rules than the state-of-the-art with full coverage over the datasets.


Leveraging Auxiliary Task Relevance for Enhanced Bearing Fault Diagnosis through Curriculum Meta-learning

arXiv.org Artificial Intelligence

The accurate diagnosis of machine breakdowns is crucial for maintaining operational safety in smart manufacturing. Despite the promise shown by deep learning in automating fault identification, the scarcity of labeled training data, particularly for equipment failure instances, poses a significant challenge. This limitation hampers the development of robust classification models. Existing methods like model-agnostic meta-learning (MAML) do not adequately address variable working conditions, affecting knowledge transfer. To address these challenges, a Related Task Aware Curriculum Meta-learning (RT-ACM) enhanced fault diagnosis framework is proposed in this paper, inspired by human cognitive learning processes. RT-ACM improves training by considering the relevance of auxiliary sensor working conditions, adhering to the principle of ``paying more attention to more relevant knowledge", and focusing on ``easier first, harder later" curriculum sampling. This approach aids the meta-learner in achieving a superior convergence state. Extensive experiments on two real-world datasets demonstrate the superiority of RT-ACM framework.


Comparative Analysis of Black-Box and White-Box Machine Learning Model in Phishing Detection

arXiv.org Artificial Intelligence

Background: Explainability in phishing detection model can support a further solution of phishing attack mitigation by increasing trust and understanding how phishing can be detected. Objective: The aims of this study to determine and best recommendation to apply an approach which has several components with abilities to fulfil the critical needs Methods: A methodology starting with analyzing both black-box and white-box models to get the pros and cons specifically in phishing detection. The conclusion of the analysis will be validated by experiment using a set of well-known algorithms and public phishing datasets. Experimental metrics covers 3 measurements such as predictive accuracy and explainability metrics. Conclusion: Both models are comparable in terms of interpretability and consistency, with room for improvement in diverse datasets. EBM as an example of white-box model is generally better suited for applications requiring explainability and actionable insights. Finally, each model, white-box and black-box model has positive and negative aspects both for performance metric and for explainable metric. It is important to consider the objective of model usage.


Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

arXiv.org Artificial Intelligence

In this work, we investigate how a model's tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. We introduce the concept of knowledge entropy, which quantifies the range of memory sources the model engages with; high knowledge entropy indicates that the model utilizes a wide range of memory sources, while low knowledge entropy suggests reliance on specific sources with greater certainty. Our analysis reveals a consistent decline in knowledge entropy as pretraining advances. We also find that the decline is closely associated with a reduction in the model's ability to acquire and retain knowledge, leading us to conclude that diminishing knowledge entropy (smaller number of active memory sources) impairs the model's knowledge acquisition and retention capabilities. We find further support for this by demonstrating that increasing the activity of inactive memory sources enhances the model's capacity for knowledge acquisition and retention.


Self-Supervised Learning for Graph-Structured Data in Healthcare Applications: A Comprehensive Review

arXiv.org Artificial Intelligence

The abundance of complex and interconnected healthcare data offers numerous opportunities to improve prediction, diagnosis, and treatment. Graph-structured data, which includes entities and their relationships, is well-suited for capturing complex connections. Effectively utilizing this data often requires strong and efficient learning algorithms, especially when dealing with limited labeled data. It is increasingly important for downstream tasks in various domains to utilize self-supervised learning (SSL) as a paradigm for learning and optimizing effective representations from unlabeled data. In this paper, we thoroughly review SSL approaches specifically designed for graph-structured data in healthcare applications. We explore the challenges and opportunities associated with healthcare data and assess the effectiveness of SSL techniques in real-world healthcare applications. Our discussion encompasses various healthcare settings, such as disease prediction, medical image analysis, and drug discovery. We critically evaluate the performance of different SSL methods across these tasks, highlighting their strengths, limitations, and potential future research directions. Ultimately, this review aims to be a valuable resource for both researchers and practitioners looking to utilize SSL for graph-structured data in healthcare, paving the way for improved outcomes and insights in this critical field. To the best of our knowledge, this work represents the first comprehensive review of the literature on SSL applied to graph data in healthcare.