ieeetransaction
Projection-Manifold Regularized Latent Diffusion for Robust General Image Fusion
This study proposes PDFuse, a robust, general training-free image fusion framework built on pre-trained latent diffusion models with projection-manifold regularization. By redefining fusion as a diffusion inference process constrained by multiple source images, PDFuse can adapt to varied image modalities and produce high-fidelity outputs utilizing the diffusion prior. To ensure both source consistency and full utilization of generative priors, we develop novel projection-manifold regularization, which consists of two core mechanisms. On the one hand, the Multisource Information Consistency Projection (MICP) establishes a projection system between diffusion latent representations and source images, solved efficiently via conjugate gradients to inject multi-source information into the inference. On the other hand, the Latent Manifold-preservation Guidance (LMG) aligns the latent distribution of diffusion variables with that of the sources, guiding generation to respect the model's manifold prior.
ADriving-Style-Adaptive Framework for Vehicle Trajectory Prediction
Vehicle trajectory prediction serves as a critical enabler for autonomous navigation and intelligent transportation systems. While existing approaches predominantly focus on pattern extraction and vehicle-environment interaction modeling, they exhibit a fundamental limitation in addressing trajectory heterogeneity originating from human driving styles. This oversight constrains prediction reliability in complex real-world scenarios. To bridge this gap, we propose the Driving-StyleAdaptive (DSA) framework, which establishes the first systematic integration of heterogeneous driving behaviors into trajectory prediction models. Specifically, our framework employs a set of basis functions tailored to each driving style to approximate the trajectory patterns. By dynamically combining and adaptively adjusting the degree of these basis functions, DSA not only enhances prediction accuracy but also provides explanations insights into the prediction process. Extensive experiments on public real-world datasets demonstrate that the DSA framework outperforms state-of-the-art methods.
Gaussian Regression-Driven Tensorized Incomplete Multi-View Clustering with Dual Manifold Regularization
Tensorized Incomplete Multi-View Clustering (TIMVC) algorithms have attracted growing attention for their ability to capture high-order correlations across multiple views. However, most existing TIMVC methods rely on simplistic noise assumptions using specific norms (e.g., โ1 or โ2,1), which fail to reflect the complex noise patterns encountered in real-world scenarios. Moreover, they primarily focus on modeling the global Euclidean structure of the tensor representation, while overlooking the preservation of local manifold structures. To address these limitations, we propose a novel approach, GaUssian regressIon-driven TIMVC with dual mAnifold Regularization (GUITAR). Specifically, we employ a Gaussian regression model to characterize complex noise distributions in a more realistic and flexible manner. Meanwhile, a dual manifold regularization is introduced in tensor representation learning, simultaneously modeling manifold information at both the view-specific and cross-view consensus levels, thereby promoting intra-view and inter-view consistency in the tensor representation. Furthermore, to better capture the intrinsic low-rank structure, we propose the high-preservation โฮด-norm tensor rank constraint, which applies differentiated penalties to the singular values, thereby enhancing the robustness of the tensor representation. In addition, an efficient optimization algorithm is developed to solve the resulting non-convex problem with provable convergence. Extensive experiments on six datasets demonstrate that our method outperforms SOTA approaches.
Bit-swapping Oriented Twin-memory Multi-view Clustering in Lifelong Incomplete Scenarios
Although receiving notable improvements, current multi-view clustering (MVC) techniques generally rely on feature library mechanisms to propagate accumulated knowledge from historical views to newly-arrived data, which overlooks the information pertaining to basis embedding within each view. Moreover, the mapping paradigm inevitably alters the values of learned landmarks and built affinities due to the uninterruption nature, accordingly disarraying the hierarchical cluster structures. To mitigate these two issues, we in the paper provide a named BSTM algorithm. Concretely, we firstly synchronize with the distinct dimensions by introducing a group of specialized projectors, and then establish unified anchors for all views collected so far to capture intrinsic patterns. Afterwards, departing from per-view architectures, we devise a shared bipartite graph construction via indicators to quantify similarity, which not only avoids redundant data-recalculations but alleviates the representation distortion caused by fusion.
Scalable Cross-View Sample Alignment for Multi-View Clustering with View Structure Similarity
Most existing multi-view clustering methods aim to generate a consensus partition across all views, based on the assumption that all views share the same sample arrangement. However, in real-world scenarios, the collected data across different views is often unsynchronized, making it difficult to ensure consistent sample correspondence between views. To address this issue, we propose a scalable sample-alignment-based multi-view clustering method, referred to as SSA-MVC. Specifically, we first employ a cluster-label matching (CLM) algorithm to select the view whose clustering labels best match those of the others as the benchmark view. Then, for each of the remaining views, we construct representations of nonaligned samples by computing their similarities with aligned samples. Based on these representations, we build a similarity graph between the non-aligned samples of each view and those in the benchmark view, which serves as the alignment criterion. This alignment criterion is then integrated into a late-fusion framework to enable clustering without requiring aligned samples. Notably, the learned sample alignment matrix can be used to enhance existing multi-view clustering methods in scenarios where sample correspondence is unavailable. The effectiveness of the proposed SSA-MVC algorithm is validated through extensive experiments conducted on eight real-world multi-view datasets.
Diff-ICMH: Harmonizing Machine and Human Vision in Image Compression with Generative Prior
Image compression methods are usually optimized isolatedly for human perception or machine analysis tasks. We reveal fundamental commonalities between these objectives: preserving accurate semantic information is paramount, as it directly dictates the integrity of critical information for intelligent tasks and aids human understanding. Concurrently, enhanced perceptual quality not only improves visual appeal but also, by ensuring realistic image distributions, benefits semantic feature extraction for machine tasks. Based on this insight, we propose Diff-ICMH, a generative image compression framework aiming for harmonizing machine and human vision in image compression. It ensures perceptual realism by leveraging generative priors and simultaneously guarantees semantic fidelity through the incorporation of Semantic Consistency loss (SC loss) during training. Additionally, we introduce the Tag Guidance Module (TGM) that leverages highly semantic image-level tags to stimulate the pre-trained diffusion model's generative capabilities, requiring minimal additional bit rates. Consequently, Diff-ICMH supports multiple intelligent tasks through a single codec and bitstream without any task-specific adaptation, while preserving high-quality visual experience for human perception. Extensive experimental results demonstrate Diff-ICMH's superiority and generalizability across diverse tasks, while maintaining visual appeal for human perception.
DQVis Dataset: Natural Language to Biomedical Visualization
Biomedical research data portals are essential resources for scientific inquiry, and interactive exploratory visualizations are an integral component for querying such data repositories. Increasingly, machine learning is being integrated into visualization systems to create natural language interfaces where questions about data can be answered with visualizations, and follow-up questions can build on the previous state. This paper introduces a framework that takes abstract low-level questions about data and a visualization grammar specification that can answer such a question, reifies them with data entities and fields that meet certain constraints, and paraphrases the question language to produce the final collection of realized data-question-visualization triplets. Furthermore, we can link these foundational elements together to construct chains of queries, visualizations, and follow-up queries. We developed an open-source review interface for evaluating the results of these datasets. We applied this framework to five biomedical research data repositories, resulting in DQVis, a dataset of 1.08 million dataquestion-visualization triplets and 11.4 thousand two-step question samples. Five visualization experts provided feedback on the generated dataset through our review interface. We present a summary of their input and publish the full reviews as an additional resource alongside the dataset.
Adversarial Graph Fusion for Incomplete Multi-view Semi-supervised Learning with Tensorial Imputation
View missing remains a significant challenge in graph-based multi-view semisupervised learning, hindering their real-world applications. To address this issue, traditional methods introduce a missing indicator matrix and focus on mining partial structure among existing samples in each view for label propagation (LP). However, we argue that these disregarded missing samples sometimes induce discontinuous local structures, i.e., sub-clusters, breaking the fundamental smoothness assumption in LP. Consequently, such a Sub-Cluster Problem (SCP) would distort graph fusion and degrade classification performance. To alleviate SCP, we propose a novel incomplete multi-view semi-supervised learning method, termed AGF-TI.
Image Stitching in Adverse Condition A Bidirectional Consistency Learning Framework and Benchmark
Deep learning-based image stitching methods have achieved promising performance on conventional stitching datasets. However, real-world scenarios may introduce challenges such as complex weather conditions, illumination variations, and dynamic scene motion, which severely degrade image quality and lead to significant misalignment in stitching results. To solve this problem, we propose an adverse condition-tolerant image stitching network, dubbed ACDIS. We first introduce a bidirectional consistency learning framework, which ensures reliable alignment through an iterative optimization paradigm that integrates differentiable image restoration and Gaussian-distribute encoded homography estimation. Subsequently, we incorporate motion constraints into the seamless composition network to produce robust stitching results without interference from moving scenes. We further propose the first adverse scene image stitching dataset, which covers diverse parallax and scenes under low-light, haze, and underwater environments. Extensive experiments show that the proposed method can generate visually pleasing stitched images under adverse conditions, outperforming state-of-the-art methods.