reconstructor
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
Reconstruction of static visual stimuli from non-invasion brain activity fMRI achieves great success, owning to advanced deep learning models such as CLIP and Stable Diffusion. However, the research on fMRI-to-video reconstruction remains limited since decoding the spatiotemporal perception of continuous visual experiences is formidably challenging. We contend that the key to addressing these challenges lies in accurately decoding both high-level semantics and low-level perception flows, as perceived by the brain in response to video stimuli. To the end, we propose NeuroClips, an innovative framework to decode high-fidelity and smooth video from fMRI. NeuroClips utilizes a semantics reconstructor to reconstruct video keyframes, guiding semantic accuracy and consistency, and employs a perception reconstructor to capture low-level perceptual details, ensuring video smoothness. During inference, it adopts a pre-trained T2V diffusion model injected with both keyframes and low-level perception flows for video reconstruction. Evaluated on a publicly available fMRI-video dataset, NeuroClips achieves smooth high-fidelity video reconstruction of up to 6s at 8FPS, gaining significant improvements over state-of-the-art models in various metrics, e.g., a 128% improvement in SSIM and an 81% improvement in spatiotemporal metrics.
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (\eg, Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity.With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1\times$ of the model size.
It is not the corruption distribution itself that 2 ultimately generates new, realistic objects; rather, it is the repeated application of the corruption and
We thank the reviewers for the valuable feedback and address specific comments below. We plan to expand Section 2.1 with additional explanation to make the paper more self-contained. Although the samples are not i.i.d., no burn-in or thinning is used. Defining such moves does require some domain expertise. We plan to include updated Guacamol results in the paper.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Asia > Singapore (0.04)
- North America > United States > Oklahoma > Beaver County (0.04)
- North America > Canada > British Columbia (0.04)
- (2 more...)
Reconstruction and Secrecy under Approximate Distance Queries
Moran, Shay, Nesterova, Elizaveta
Consider the task of locating an unknown target point using approximate distance queries: in each round, a reconstructor selects a query point and receives a noisy version of its distance to the target. This problem arises naturally in various contexts ranging from localization in GPS and sensor networks to privacy-aware data access, and spans a wide variety of metric spaces. It is relevant from the perspective of both the reconstructor (seeking accurate recovery) and the responder (aiming to limit information disclosure, e.g., for privacy or security reasons). We study this reconstruction game through a learning-theoretic lens, focusing on the rate and limits of the best possible reconstruction error. Our first result provides a tight geometric characterization of the optimal error in terms of the Chebyshev radius, a classical concept from geometry. This characterization applies to all compact metric spaces (in fact, even to all totally bounded spaces) and yields explicit formulas for natural metric spaces. Our second result addresses the asymptotic behavior of reconstruction, distinguishing between pseudo-finite spaces -- where the optimal error is attained after finitely many queries -- and spaces where the approximation curve exhibits nontrivial decay. We characterize pseudo-finiteness for convex Euclidean spaces.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation
Feng, Yongsheng, Xu, Yuetonghui, Luo, Jiehui, Liu, Hongjia, Li, Xiaobing, Yu, Feng, Li, Wei
Source separation is a fundamental task in speech, music, and audio processing, and it also provides cleaner and larger data for training generative models. However, improving separation performance in practice often depends on increasingly large networks, inflating training and deployment costs. Motivated by recent advances in inference-time scaling for generative modeling, we propose Training-Time and Inference-Time Scalable Discriminative Source Separation (TISDiSS), a unified framework that integrates early-split multi-loss supervision, shared-parameter design, and dynamic inference repetitions. TISDiSS enables flexible speed-performance trade-offs by adjusting inference depth without retraining additional models. We further provide systematic analyses of architectural and training choices and show that training with more inference repetitions improves shallow-inference performance, benefiting low-latency applications. Experiments on standard speech separation benchmarks demonstrate state-of-the-art performance with a reduced parameter count, establishing TISDiSS as a scalable and practical framework for adaptive source separation. Code is available at https://github.com/WingSingFung/TISDiSS.
- Asia > China (0.28)
- North America > United States (0.14)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Asia > Singapore (0.04)
- North America > United States > Oklahoma > Beaver County (0.04)
- North America > Canada > British Columbia (0.04)
- (2 more...)
AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers
Abstract--Generative models are increasingly adopted in high-stakes domains, yet current deployments offer no mechanisms to verify whether a given output truly originates from the certified model. We address this gap by extending model fingerprinting techniques beyond the traditional collaborative setting to one where the model provider itself may act adversarially, replacing the certified model with a cheaper or lower-quality substitute. T o our knowledge, this is the first work to study fingerprinting for provenance attribution under such a threat model. Our approach introduces a trusted verifier that, during a certification phase, extracts hidden fingerprints from the authentic model's output space and trains a detector to recognize them. During verification, this detector can determine whether new outputs are consistent with the certified model, without requiring specialized hardware or model modifications. In extensive experiments, our methods achieve near-zero FPR@95%TPR on both GANs and diffusion models, and remain effective even against subtle architectural or training changes. Furthermore, the approach is robust to adaptive adversaries that actively manipulate outputs in an attempt to evade detection. Recent advances in generative AI have led to the widespread deployment of generative models across various domains, with providers of generative AI services increasingly monetizing their models by offering subscription-based access. However, this rapid adoption has raised serious concerns about the risks posed by these models, particularly in safety-critical domains, such as healthcare and defense, where erroneous model outputs can have disastrous consequences [1]. In response, policymakers are introducing legal frameworks to regulate the use of AI and, in particular, the deployment of generative models. For instance, the European Union's AI Act mandates independent, periodic audits for "high-risk" AI systems deployed in domains such as healthcare, education, employment, and critical infrastructure [2]. This requirement to pass or be certified by an audit raises a critical question: How can users verify that a given output indeed originated from the audited model?
- Europe (0.34)
- North America > United States (0.14)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > Europe Government (0.34)