Goto

Collaborating Authors

 background information



Projection Regret: Reducing Background Bias for Novelty Detection via Diffusion Models

Neural Information Processing Systems

Novelty detection is a fundamental task of machine learning which aims to detect abnormal ( out-of-distribution (OOD)) samples. Since diffusion models have recently emerged as the de facto standard generative framework with surprising generation results, novelty detection via diffusion models has also gained much attention. Recent methods have mainly utilized the reconstruction property of in-distribution samples. However, they often suffer from detecting OOD samples that share similar background information to the in-distribution data. Based on our observation that diffusion models can any sample to an in-distribution sample with similar background information, we propose, an efficient novelty detection method that mitigates the bias of non-semantic information. To be specific, PR computes the perceptual distance between the test image and its diffusion-based projection to detect abnormality. Since the perceptual distance often fails to capture semantic changes when the background information is dominant, we cancel out the background bias by comparing it against recursive projections. Extensive experiments demonstrate that PR outperforms the prior art of generative-model-based novelty detection methods by a significant margin.


Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning

Fan, Shanwei, Zhang, Bin, Xu, Zhiwei, Teng, Yingxuan, Dai, Siqi, Cheng, Lin, Fan, Guoliang

arXiv.org Artificial Intelligence

Large language models (LLMs) offer strong high-level planning capabilities for reinforcement learning (RL) by decomposing tasks into subgoals. However, their practical utility is limited by poor planning-execution alignment, which reflects a critical gap between abstract plans and actionable, environment-compatible behaviors. This misalignment arises from two interrelated limitations: (1) LLMs often produce subgoals that are semantically plausible but infeasible or irrelevant in the target environment due to insufficient grounding in environment-specific knowledge, and (2) single-LLM planning conflates generation with self-verification, resulting in overconfident yet unreliable subgoals that frequently fail during execution. To address these challenges, we propose Subgoal Graph-Augmented Actor-Critic-Refiner (SGA-ACR), a framework that integrates an environment-specific subgoal graph and structured entity knowledge with a multi-LLM planning pipeline that explicitly separates generation, critique, and refinement to produce executable and verifiable subgoals. A subgoal tracker further monitors execution progress, provides auxiliary rewards, and adaptively updates the subgoal graph to maintain alignment between plans and actions. Experimental results on 22 diverse tasks in the open-world game "Crafter" demonstrate the effectiveness of our proposed method.


Supplementary Material Appendix

Neural Information Processing Systems

In contrast, when only the cycle-ERF covered by specific objects is taken into account, our method performs better than baseline.


Projection Regret: Reducing Background Bias for Novelty Detection via Diffusion Models

Neural Information Processing Systems

Since diffusion models have recently emerged as the de facto standard generative framework with surprising generation results, novelty detection via diffusion models has also gained much attention.


Vision Language Models Are Not (Yet) Spelling Correctors

Liang, Junhong, Zhang, Bojun

arXiv.org Artificial Intelligence

Spelling correction from visual input poses unique challenges for vision language models (VLMs), as it requires not only detecting but also correcting textual errors directly within images. We present ReViCo (Real Visual Correction), the first benchmark that systematically evaluates VLMs on real-world visual spelling correction across Chinese and English. ReViCo contains naturally occurring errors collected from real-world image data and supports fine-grained evaluation at both image and token levels. Through comprehensive experiments on representative cascaded (Qwen) and native (InternVL) open-source models, as well as closed-source systems (GPT-4o, Claude), we show that current VLMs fall significantly short of human performance, particularly in correction. To address these limitations, we explore two solution paradigms: a Joint OCR-Correction pipeline and a Background Information enhanced approach, both of which yield consistent performance gains. Our analysis highlights fundamental limitations of existing architectures and provides actionable insights for advancing multimodal spelling correction.


Symphony: A Decentralized Multi-Agent Framework for Scalable Collective Intelligence

Wang, Ji, Chen, Kashing, Song, Xinyuan, Zhang, Ke, Ai, Lynn, Yang, Eric, Shi, Bill

arXiv.org Artificial Intelligence

Most existing Large Language Model (LLM)-based agent frameworks rely on centralized orchestration, incurring high deployment costs, rigid communication topologies, and limited adaptability. To address these challenges, we introduce Symphony, a decentralized multi-agent system which enables lightweight LLMs on consumer-grade GPUs to coordinate. Symphony introduces three key mechanisms: (1) a decentralized ledger that records capabilities, (2) a Beacon-selection protocol for dynamic task allocation, and (3) weighted result voting based on CoTs. This design forms a privacy-saving, scalable, and fault-tolerant orchestration with low overhead. Empirically, Symphony outperforms existing baselines on reasoning benchmarks, achieving substantial accuracy gains and demonstrating robustness across models of varying capacities.


Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study

Luketina, Marina, Benkel, Andrea, Schuetz, Christoph G.

arXiv.org Artificial Intelligence

This paper provides an experimental evaluation of the capability of large language models (LLMs) to assist in legal decision-making within the framework of Austrian and European Union value-added tax (VAT) law. In tax consulting practice, clients often describe cases in natural language, making LLMs a prime candidate for supporting automated decision-making and reducing the workload of tax professionals. Given the requirement for legally grounded and well-justified analyses, the propensity of LLMs to hallucinate presents a considerable challenge. The experiments focus on two common methods for enhancing LLM performance: fine-tuning and retrieval-augmented generation (RAG). In this study, these methods are applied on both textbook cases and real-world cases from a tax consulting firm to systematically determine the best configurations of LLM-based systems and assess the legal-reasoning capabilities of LLMs. The findings highlight the potential of using LLMs to support tax consultants by automating routine tasks and providing initial analyses, although current prototypes are not ready for full automation due to the sensitivity of the legal domain. The findings indicate that LLMs, when properly configured, can effectively support tax professionals in VAT tasks and provide legally grounded justifications for decisions. However, limitations remain regarding the handling of implicit client knowledge and context-specific documentation, underscoring the need for future integration of structured background information.


Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration

Lu, Xin, Fu, Xueyang, Xiao, Jie, Fan, Zihao, Zhu, Yurui, Zha, Zheng-Jun

arXiv.org Artificial Intelligence

While diffusion models demonstrate strong generative capabilities in image restoration (IR) tasks, their complex architectures and iterative processes limit their practical application compared to mainstream reconstruction-based general ordinary IR networks. Existing approaches primarily focus on optimizing network architecture and diffusion paths but overlook the integration of the diffusion training paradigm within general ordinary IR frameworks. To address these challenges, this paper elucidates key principles for adapting the diffusion training paradigm to general IR training through systematic analysis of time-step dependencies, network hierarchies, noise-level relationships, and multi-restoration task correlations, proposing a new IR framework supported by diffusion-based training. To enable IR networks to simultaneously restore images and model generative representations, we introduce a series of regularization strategies that align diffusion objectives with IR tasks, improving generalization in single-task scenarios. Furthermore, recognizing that diffusion-based generation exerts varying influences across different IR tasks, we develop an incremental training paradigm and task-specific adaptors, further enhancing performance in multi-task unified IR. Experiments demonstrate that our method significantly improves the generalization of IR networks in single-task IR and achieves superior performance in multi-task unified IR. Notably, the proposed framework can be seamlessly integrated into existing general IR architectures.


Resolving Conflicting Evidence in Automated Fact-Checking: A Study on Retrieval-Augmented LLMs

Ge, Ziyu, Wu, Yuhao, Chin, Daniel Wai Kit, Lee, Roy Ka-Wei, Cao, Rui

arXiv.org Artificial Intelligence

Large Language Models (LLMs) augmented with retrieval mechanisms have demonstrated significant potential in fact-checking tasks by integrating external knowledge. However, their reliability decreases when confronted with conflicting evidence from sources of varying credibility. This paper presents the first systematic evaluation of Retrieval-Augmented Generation (RAG) models for fact-checking in the presence of conflicting evidence. To support this study, we introduce \textbf{CONFACT} (\textbf{Con}flicting Evidence for \textbf{Fact}-Checking) (Dataset available at https://github.com/zoeyyes/CONFACT), a novel dataset comprising questions paired with conflicting information from various sources. Extensive experiments reveal critical vulnerabilities in state-of-the-art RAG methods, particularly in resolving conflicts stemming from differences in media source credibility. To address these challenges, we investigate strategies to integrate media background information into both the retrieval and generation stages. Our results show that effectively incorporating source credibility significantly enhances the ability of RAG models to resolve conflicting evidence and improve fact-checking performance.