Not enough data to create a plot.
Try a different view from the menu above.
Guangdong Province
Learning Discrete Latent Variable Structures with Tensor Rank Conditions Zhengming Chen
Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures.
Linear Uncertainty Quantification of Graphical Model Inference
Uncertainty Quantification (UQ) is vital for decision makers as it offers insights into the potential reliability of data and model, enabling more informed and risk-aware decision-making. Graphical models, capable of representing data with complex dependencies, are widely used across domains. Existing sampling-based UQ methods are unbiased but cannot guarantee convergence and are time-consuming on largescale graphs. There are fast UQ methods for graphical models with closed-form solutions and convergence guarantee but with uncertainty underestimation. We propose LinUProp, a UQ method that utilizes a novel linear propagation of uncertainty to model uncertainty among related nodes additively instead of multiplicatively, to offer linear scalability, guaranteed convergence, and closed-form solutions without underestimating uncertainty. Theoretically, we decompose the expected prediction error of the graphical model and prove that the uncertainty computed by LinUProp is the generalized variance component of the decomposition. Experimentally, we demonstrate that LinUProp is consistent with the sampling-based method but with linear scalability and fast convergence. Moreover, LinUProp outperforms competitors in uncertainty-based active learning on four real-world graph datasets, achieving higher accuracy with a lower labeling budget.
Appendix A Extensive Datasets in Touchstone 20 A.1 Construction of AbdomenAtlas 1.0 20 A.2 Domain Shift in TotalSegmentator
The naive aggregation of these public datasets results in a database with partial and incomplete labels, e.g., LiTS only had labels for the liver and its tumors, and KiTS only had labels for the kidneys and its tumors. Conversely, our AbdomenAtlas 1.0 is fully-annotated, offering detailed per-voxel labels for all 9 organs within each CT scan. We detected and removed duplicated CT scans across public datasets like LiTS and FLARE'23. Duplicate scans were identified by generating a 3D perceptual hash [83] for each image in the dataset. By comparing the similarity of these hashes, duplicates were reliably detected, a finding that was further confirmed through manual inspection of CT scans with high perceptual hash similarities. After aggregating all datasets and removing duplicates, we obtained a total of 5,195 fully-annotated CT scans. Part of TotalSegmentator is included in the AbdomenAtlas dataset (N=485), because it is contained in FLARE, one of the AbdomenAtlas constituents.
Boundary Decomposition for Nadir Objective Vector Estimation
The nadir objective vector plays a key role in solving multi-objective optimization problems (MOPs), where it is often used to normalize the objective space and guide the search. The current methods for estimating the nadir objective vector perform effectively only on specific MOPs. This paper reveals the limitations of these methods: exact methods can only work on discrete MOPs, while heuristic methods cannot deal with the MOP with a complicated feasible objective region. To fill this gap, we propose a general and rigorous method, namely boundary decomposition for nadir objective vector estimation (BDNE).
Towards Safe Concept Transfer of Multi-Modal Diffusion via Causal Representation Editing Peiran Dong 1 Bingjie Wang 1 Song Guo 2 Junxiao Wang
Recent advancements in vision-language-to-image (VL2I) diffusion generation have made significant progress. While generating images from broad visionlanguage inputs holds promise, it also raises concerns about potential misuse, such as copying artistic styles without permission, which could have legal and social consequences.
CRAG - Comprehensive RAG Benchmark Xiao Yang
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation of this benchmark highlights the gap to fully trustworthy QA.
StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences
Prior multi-frame optical flow methods typically estimate flow repeatedly in a pairwise manner, leading to significant computational redundancy. To mitigate this, we implement a Streamlined In-batch Multi-frame (SIM) pipeline, specifically tailored to video inputs to minimize redundant calculations. It enables the simultaneous prediction of successive unidirectional flows in a single forward pass, boosting processing speed by 44.43% and reaching efficiencies on par with two-frame networks. Moreover, we investigate various spatiotemporal modeling methods for optical flow estimation within this pipeline. Notably, we propose a simple yet highly effective parameter-efficient Integrative spatiotemporal Coherence (ISC) modeling method, alongside a lightweight Global Temporal Regressor (GTR) to harness temporal cues. The proposed ISC and GTR bring powerful spatiotemporal modeling capabilities and significantly enhance accuracy, including in occluded areas, while adding modest computations to the SIM pipeline. Compared to the baseline, our approach, StreamFlow, achieves performance enhancements of 15.45% and 11.37% on the Sintel clean and final test sets respectively, with gains of 15.53% and 10.77% on occluded regions and only a 1.11% rise in latency. Furthermore, StreamFlow exhibits state-of-the-art cross-dataset testing results on Sintel and KITTI, demonstrating its robust cross-domain generalization capabilities. The code is available here.
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems Xialiang Tong
Single-stage neural combinatorial optimization solvers have achieved near-optimal results on various small-scale combinatorial optimization (CO) problems without requiring expert knowledge. However, these solvers exhibit significant performance degradation when applied to large-scale CO problems. Recently, two-stage neural methods motivated by divide-and-conquer strategies have shown efficiency in addressing large-scale CO problems. Nevertheless, the performance of these methods highly relies on problem-specific heuristics in either the dividing or the conquering procedure, which limits their applicability to general CO problems. Moreover, these methods employ separate training schemes and ignore the interdependencies between the dividing and conquering strategies, often leading to sub-optimal solutions. To tackle these drawbacks, this article develops a unified neural divide-and-conquer framework (i.e., UDC) for solving general large-scale CO problems. UDC offers a Divide-Conquer-Reunion (DCR) training method to eliminate the negative impact of a sub-optimal dividing policy. Employing a high-efficiency Graph Neural Network (GNN) for global instance dividing and a fixed-length sub-path solver for conquering divided sub-problems, the proposed UDC framework demonstrates extensive applicability, achieving superior performance in 10 representative large-scale CO problems.