codegraph
On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection
Zero-shot anomaly classification and segmentation (AC/AS) aim to detect anomalous samples and regions without any training data, a capability increasingly crucial in industrial inspection and medical imaging. This dissertation aims to investigate the core challenges of zero-shot AC/AS and presents principled solutions rooted in theory and algorithmic design. We first formalize the problem of consistent anomalies, a failure mode in which recurring similar anomalies systematically bias distance-based methods. By analyzing the statistical and geometric behavior of patch representations from pre-trained Vision Transformers, we identify two key phenomena - similarity scaling and neighbor-burnout - that describe how relationships among normal patches change with and without consistent anomalies in settings characterized by highly similar objects. We then introduce CoDeGraph, a graph-based framework for filtering consistent anomalies built on the similarity scaling and neighbor-burnout phenomena. Through multi-stage graph construction, community detection, and structured refinement, CoDeGraph effectively suppresses the influence of consistent anomalies. Next, we extend this framework to 3D medical imaging by proposing a training-free, computationally efficient volumetric tokenization strategy for MRI data. This enables a genuinely zero-shot 3D anomaly detection pipeline and shows that volumetric anomaly segmentation is achievable without any 3D training samples. Finally, we bridge batch-based and text-based zero-shot methods by demonstrating that CoDeGraph-derived pseudo-masks can supervise prompt-driven vision-language models. Together, this dissertation provides theoretical understanding and practical solutions for the zero-shot AC/AS problem.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
CodeGraph: Enhancing Graph Reasoning of LLMs with Code
Cai, Qiaolong, Wang, Zhaowei, Diao, Shizhe, Kwok, James, Song, Yangqiu
With the increasing popularity of large language models (LLMs), reasoning on basic graph algorithm problems is an essential intermediate step in assessing their abilities to process and infer complex graph reasoning tasks. Existing methods usually convert graph-structured data to textual descriptions and then use LLMs for reasoning and computation. However, LLMs often produce computation errors on arithmetic parts in basic graph algorithm problems, such as counting number of edges. In addition, they struggle to control or understand the output of the reasoning process, raising concerns about whether LLMs are simply guessing. In this paper, we introduce CodeGraph, a method that encodes graph problem solutions as code. The methods solve new graph problems by learning from exemplars, generating programs, and executing them via a program interpreter. Using the few-shot setting, we evaluate CodeGraph with the base LLM being GPT-3.5 Turbo, Llama3-70B Instruct, Mixtral-8x22B Instruct, and Mixtral-8x7B Instruct. Experimental results on six tasks with six graph encoding methods in the GraphQA dataset demonstrate that CodeGraph can boost performance on graph reasoning tasks inside LLMs by 1.3% to 58.6%, depending on the task. Compared to the existing methods, CodeGraph demonstrates strong performance on arithmetic problems in graph tasks and offers a more controllable and interpretable approach to the reasoning process.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
Source Code is a Graph, Not a Sequence: A Cross-Lingual Perspective on Code Clone Detection
Rahaman, Mohammed Ataaur, Ive, Julia
Source code clone detection is the task of finding code fragments that have the same or similar functionality, but may differ in syntax or structure. This task is important for software maintenance, reuse, and quality assurance (Roy et al. 2009). However, code clone detection is challenging, as source code can be written in different languages, domains, and styles. In this paper, we argue that source code is inherently a graph, not a sequence, and that graph-based methods are more suitable for code clone detection than sequence-based methods. We compare the performance of two state-of-the-art models: CodeBERT (Feng et al. 2020), a sequence-based model, and CodeGraph (Yu et al. 2023), a graph-based model, on two benchmark data-sets: BCB (Svajlenko et al. 2014) and PoolC (PoolC no date). We show that CodeGraph outperforms CodeBERT on both data-sets, especially on cross-lingual code clones. To the best of our knowledge, this is the first work to demonstrate the superiority of graph-based methods over sequence-based methods on cross-lingual code clone detection.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)