Goto

Collaborating Authors

 Overview


A review of feature selection strategies utilizing graph data structures and knowledge graphs

arXiv.org Machine Learning

Feature selection in Knowledge Graphs (KGs) are increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through this comprehensive review, we aim to catalyze further innovation in feature selection for KGs, paving the way for more insightful, efficient, and interpretable analytical models across various domains. Our exploration reveals the critical importance of scalability, accuracy, and interpretability in feature selection techniques, advocating for the integration of domain knowledge to refine the selection process. We highlight the burgeoning potential of multi-objective optimization and interdisciplinary collaboration in advancing KG feature selection, underscoring the transformative impact of such methodologies on precision medicine, among other fields. The paper concludes by charting future directions, including the development of scalable, dynamic feature selection algorithms and the integration of explainable AI principles to foster transparency and trust in KG-driven models.


Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

arXiv.org Machine Learning

To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating a gap that hinders the application and evaluation of methods from one to the other. To bridge the gap, in this work, we present a Unified Benchmark for unsupervised Graph-level OOD and anomaly Detection (our method), a comprehensive evaluation framework that unifies GLAD and GLOD under the concept of generalized graph-level OOD detection. Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 16 representative GLAD/GLOD methods. We conduct multi-dimensional analyses to explore the effectiveness, generalizability, robustness, and efficiency of existing methods, shedding light on their strengths and limitations. Furthermore, we provide an open-source codebase (https://github.com/UB-GOLD/UB-GOLD) of our method to foster reproducible research and outline potential directions for future investigations based on our insights.


EasyECR: A Library for Easy Implementation and Evaluation of Event Coreference Resolution Models

arXiv.org Artificial Intelligence

Event Coreference Resolution (ECR) is the task of clustering event mentions that refer to the same real-world event. Despite significant advancements, ECR research faces two main challenges: limited generalizability across domains due to narrow dataset evaluations, and difficulties in comparing models within diverse ECR pipelines. To address these issues, we develop EasyECR, the first open-source library designed to standardize data structures and abstract ECR pipelines for easy implementation and fair evaluation. More specifically, EasyECR integrates seven representative pipelines and ten popular benchmark datasets, enabling model evaluations on various datasets and promoting the development of robust ECR pipelines. By conducting extensive evaluation via our EasyECR, we find that, \lowercase\expandafter{\romannumeral1}) the representative ECR pipelines cannot generalize across multiple datasets, hence evaluating ECR pipelines on multiple datasets is necessary, \lowercase\expandafter{\romannumeral2}) all models in ECR pipelines have a great effect on pipeline performance, therefore, when one model in ECR pipelines are compared, it is essential to ensure that the other models remain consistent. Additionally, reproducing ECR results is not trivial, and the developed library can help reduce this discrepancy. The experimental results provide valuable baselines for future research.


Learning to Discover Knowledge: A Weakly-Supervised Partial Domain Adaptation Approach

arXiv.org Artificial Intelligence

Domain adaptation has shown appealing performance by leveraging knowledge from a source domain with rich annotations. However, for a specific target task, it is cumbersome to collect related and high-quality source domains. In real-world scenarios, large-scale datasets corrupted with noisy labels are easy to collect, stimulating a great demand for automatic recognition in a generalized setting, i.e., weakly-supervised partial domain adaptation (WS-PDA), which transfers a classifier from a large source domain with noises in labels to a small unlabeled target domain. As such, the key issues of WS-PDA are: 1) how to sufficiently discover the knowledge from the noisy labeled source domain and the unlabeled target domain, and 2) how to successfully adapt the knowledge across domains. In this paper, we propose a simple yet effective domain adaptation approach, termed as self-paced transfer classifier learning (SP-TCL), to address the above issues, which could be regarded as a well-performing baseline for several generalized domain adaptation tasks. The proposed model is established upon the self-paced learning scheme, seeking a preferable classifier for the target domain. Specifically, SP-TCL learns to discover faithful knowledge via a carefully designed prudent loss function and simultaneously adapts the learned knowledge to the target domain by iteratively excluding source examples from training under the self-paced fashion. Extensive evaluations on several benchmark datasets demonstrate that SP-TCL significantly outperforms state-of-the-art approaches on several generalized domain adaptation tasks.


Capturing Temporal Components for Time Series Classification

arXiv.org Artificial Intelligence

Analyzing sequential data is crucial in many domains, particularly due to the abundance of data collected from the Internet of Things paradigm. Time series classification, the task of categorizing sequential data, has gained prominence, with machine learning approaches demonstrating remarkable performance on public benchmark datasets. However, progress has primarily been in designing architectures for learning representations from raw data at fixed (or ideal) time scales, which can fail to generalize to longer sequences. This work introduces a \textit{compositional representation learning} approach trained on statistically coherent components extracted from sequential data. Based on a multi-scale change space, an unsupervised approach is proposed to segment the sequential data into chunks with similar statistical properties. A sequence-based encoder model is trained in a multi-task setting to learn compositional representations from these temporal components for time series classification. We demonstrate its effectiveness through extensive experiments on publicly available time series classification benchmarks. Evaluating the coherence of segmented components shows its competitive performance on the unsupervised segmentation task.


A Survey of Retrieval Algorithms in Ad and Content Recommendation Systems

arXiv.org Artificial Intelligence

In RAG, a retrieval component is used to fetch relevant documents or data points, which are then used to inform This survey examines the most effective retrieval and enhance the generation process of the LLM. Similarly, algorithms utilized in ad recommendation and content in recommendation systems, retrieval methods serve to bring recommendation systems. Ad targeting algorithms forth pertinent user and item data, which can then be leveraged rely on detailed user profiles and behavioral by the recommendation algorithms to produce highly data to deliver personalized advertisements, personalized and accurate recommendations. This parallel thereby driving revenue through targeted placements.


Cross-level Requirement Traceability: A Novel Approach Integrating Bag-of-Words and Word Embedding for Enhanced Similarity Functionality

arXiv.org Artificial Intelligence

A software requirement is a description of what the software will do and how it will be expected to perform [1]. Requirements vary in degree of detail, starting from business requirements, which are the highest levels of abstraction that focus on the general purpose of the target system, all the way up to software requirements, which precisely describe the characteristics and features required from the system [2]. Depending on these types, the task of requirements traceability varies between linking requirements from one level or linking requirements from different levels (such as linking business requirements to system requirements), which is called cross-level [2]. In general, requirements traceability is defined as "the ability to describe and follow the life of a requirement in both forwards and backward direction through periods of ongoing refinement and iteration" [3]. This task plays a crucial role in the success of software development so that the development team can better understand the needs of customers and system users, and thus be able to design and develop a system that effectively meets those requirements [4].


Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become an important way of information seeking, there have been increasing concerns about the unethical content LLMs may generate. In this paper, we conduct a rigorous evaluation of LLMs' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. Our attack methodology is inspired by psychometric principles in cognitive and social psychology. We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types. Each prompt attack has bilingual versions. Extensive evaluation of representative LLMs shows that 1) all three attack methods work effectively, especially the Deception attacks; 2) GLM-3 performs the best in defending our attacks, compared to GPT-3.5 and GPT-4; 3) LLMs could output content of other bias types when being taught with one type of bias. Our methodology provides a rigorous and effective way of evaluating LLMs' implicit bias and will benefit the assessments of LLMs' potential ethical risks.


Towards Event-oriented Long Video Understanding

arXiv.org Artificial Intelligence

With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce Event-Bench, an event-oriented long video understanding benchmark built on existing datasets and human annotations. Event-Bench includes six event-related tasks and 2,190 test instances to comprehensively evaluate video event understanding ability. Additionally, we propose Video Instruction Merging~(VIM), a cost-effective method that enhances video MLLMs using merged, event-intensive video instructions, addressing the scarcity of human-annotated, event-intensive data. Extensive experiments show that the best-performing model, GPT-4o, achieves an overall accuracy of 53.33, significantly outperforming the best open-source model by 41.42%. Leveraging an effective instruction synthesis method and an adaptive model architecture, VIM surpasses both state-of-the-art open-source models and GPT-4V on the Event-Bench. All code, data, and models are publicly available at https://github.com/RUCAIBox/Event-Bench.


Word Matters: What Influences Domain Adaptation in Summarization?

arXiv.org Artificial Intelligence

Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation performance, analyzing the specific impact of `words' in training data on summarization tasks. We propose quantifying dataset learning difficulty as the learning difficulty of generative summarization, which is determined by two indicators: word-based compression rate and abstraction level. Our experiments conclude that, when considering dataset learning difficulty, the cross-domain overlap and the performance gain in summarization tasks exhibit an approximate linear relationship, which is not directly related to the number of words. Based on this finding, predicting a model's performance on unknown domain datasets is possible without undergoing training.