Tian, Lei
S$^2$AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance
Li, Yuanhang, Mao, Qi, Chen, Lan, Fang, Zhen, Tian, Lei, Xiao, Xinyan, Jin, Libiao, Wu, Hua
Recent advancements in text-to-video (T2V) generation using diffusion models have garnered significant attention. However, existing T2V models primarily focus on simple scenes featuring a single object performing a single motion. Challenges arise in scenarios involving multiple objects with distinct motions, often leading to incorrect video-text alignment between subjects and their corresponding motions. To address this challenge, we propose \textbf{S$^2$AG-Vid}, a training-free inference-stage optimization method that improves the alignment of multiple objects with their corresponding motions in T2V models. S$^2$AG-Vid initially applies a spatial position-based, cross-attention (CA) constraint in the early stages of the denoising process, facilitating multiple nouns distinctly attending to the correct subject regions. To enhance the motion-subject binding, we implement a syntax-guided contrastive constraint in the subsequent denoising phase, aimed at improving the correlations between the CA maps of verbs and their corresponding nouns.Both qualitative and quantitative evaluations demonstrate that the proposed framework significantly outperforms baseline approaches, producing higher-quality videos with improved subject-motion consistency.
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models
Wang, Hanzhang, Wang, Haoran, Yang, Jinze, Yu, Zhongrui, Xie, Zeke, Tian, Lei, Xiao, Xinyan, Jiang, Junjun, Liu, Xianming, Sun, Mingming
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. Existing methods usually focus on pursuing the balance between style and content, whereas ignoring the significant demand for flexible and customized stylization results and thereby limiting their practical application. To address this critical issue, a novel AST approach namely HiCAST is proposed, which is capable of explicitly customizing the stylization results according to various source of semantic clues. In the specific, our model is constructed based on Latent Diffusion Model (LDM) and elaborately designed to absorb content and style instance as conditions of LDM. It is characterized by introducing of \textit{Style Adapter}, which allows user to flexibly manipulate the output results by aligning multi-level style information and intrinsic knowledge in LDM. Lastly, we further extend our model to perform video AST. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency in the premise of maintaining stylization strength. Qualitative and quantitative comparisons as well as comprehensive user studies demonstrate that our HiCAST outperforms the existing SoTA methods in generating visually plausible stylization results.
Local Conditional Neural Fields for Versatile and Generalizable Large-Scale Reconstructions in Computational Imaging
Wang, Hao, Zhu, Jiabei, Li, Yunzhe, Yang, QianWan, Tian, Lei
Deep learning has transformed computational imaging, but traditional pixel-based representations limit their ability to capture continuous, multiscale details of objects. Here we introduce a novel Local Conditional Neural Fields (LCNF) framework, leveraging a continuous implicit neural representation to address this limitation. LCNF enables flexible object representation and facilitates the reconstruction of multiscale information. We demonstrate the capabilities of LCNF in solving the highly ill-posed inverse problem in Fourier ptychographic microscopy (FPM) with multiplexed measurements, achieving robust, scalable, and generalizable large-scale phase retrieval. Unlike traditional neural fields frameworks, LCNF incorporates a local conditional representation that promotes model generalization, learning multiscale information, and efficient processing of large-scale imaging data. By combining an encoder and a decoder conditioned on a learned latent vector, LCNF achieves versatile continuous-domain super-resolution image reconstruction. We demonstrate accurate reconstruction of wide field-of-view, high-resolution phase images using only a few multiplexed measurements. LCNF robustly captures the continuous object priors and eliminates various phase artifacts, even when it is trained on imperfect datasets. The framework exhibits strong generalization, reconstructing diverse objects even with limited training data. Furthermore, LCNF can be trained on a physics simulator using natural images and successfully applied to experimental measurements on biological samples. Our results highlight the potential of LCNF for solving large-scale inverse problems in computational imaging, with broad applicability in various deep-learning-based techniques.
Cross-Domain Label Propagation for Domain Adaptation with Discriminative Graph Self-Learning
Tian, Lei, Tang, Yongqiang, Hu, Liangchen, Zhang, Wensheng
Domain adaptation manages to transfer the knowledge of well-labeled source data to unlabeled target data. Many recent efforts focus on improving the prediction accuracy of target pseudo-labels to reduce conditional distribution shift. In this paper, we propose a novel domain adaptation method, which infers target pseudo-labels through cross-domain label propagation, such that the underlying manifold structure of two domain data can be explored. Unlike existing cross-domain label propagation methods that separate domain-invariant feature learning, affinity matrix constructing and target labels inferring into three independent stages, we propose to integrate them into a unified optimization framework. In such way, these three parts can boost each other from an iterative optimization perspective and thus more effective knowledge transfer can be achieved. Furthermore, to construct a high-quality affinity matrix, we propose a discriminative graph self-learning strategy, which can not only adaptively capture the inherent similarity of the data from two domains but also effectively exploit the discriminative information contained in well-labeled source data and pseudo-labeled target data. An efficient iterative optimization algorithm is designed to solve the objective function of our proposal. Notably, the proposed method can be extended to semi-supervised domain adaptation in a simple but effective way and the corresponding optimization problem can be solved with the identical algorithm. Extensive experiments on six standard datasets verify the significant superiority of our proposal in both unsupervised and semi-supervised domain adaptation settings.
Adversarial Learning-based Stance Classifier for COVID-19-related Health Policies
Xie, Feng, Zhang, Zhong, Zhao, Xuechen, Wang, Haiyang, Zou, Jiaying, Tian, Lei, Zhou, Bin, Tan, Yusong
The ongoing COVID-19 pandemic has caused immeasurable losses for people worldwide. To contain the spread of the virus and further alleviate the crisis, various health policies (e.g., stay-at-home orders) have been issued which spark heated discussions as users turn to share their attitudes on social media. In this paper, we consider a more realistic scenario on stance detection (i.e., cross-target and zero-shot settings) for the pandemic and propose an adversarial learning-based stance classifier to automatically identify the public's attitudes toward COVID-19-related health policies. Specifically, we adopt adversarial learning that allows the model to train on a large amount of labeled data and capture transferable knowledge from source topics, so as to enable generalize to the emerging health policies with sparse labeled data. To further enhance the model's deeper understanding, we incorporate policy descriptions as external knowledge into the model. Meanwhile, a GeoEncoder is designed which encourages the model to capture unobserved background factors specified by each region and then represent them as non-text information. We evaluate the performance of a broad range of baselines on the stance detection task for COVID-19-related health policies, and experimental results show that our proposed method achieves state-of-the-art performance in both cross-target and zero-shot settings.
User-Centric Indoor Air Quality Monitoring on Mobile Devices
Jiang, Yifei (University of Colorado, Boulder) | Li, Kun (University of Colorado, Boulder) | Piedrahita, Ricardo (University of Colorado, Boulder) | Yun, Xiang (University of Michigan) | Tian, Lei (University of Colorado, Boulder) | Mansata, Omkar M. (University of Michigan) | Lv, Qin (University of Colorado, Boulder) | Dick, Robert P. (University of Michigan) | Hannigan, Michael (University of Colorado, Boulder) | Shang, Li (University of Colorado, Boulder)
Since people spend a majority of their time indoors, indoor air quality (IAQ) can have a significant impact on human health, safety, productivity, and comfort. Due to the diversity and dynamics of people's indoor activities, it is important to monitor IAQ for each individual. Most existing air quality sensing systems are stationary or focus on outdoor air quality. In contrast, we propose MAQS, a user-centric mobile sensing system for IAQ monitoring. MAQS users carry portable, indoor location tracking and IAQ sensing devices that provide personalized IAQ information in real time. To improve accuracy and energy efficiency, MAQS incorporates three novel techniques: (1) an accurate temporal n-gram augmented Bayesian room localization method that requires few Wi-Fi fingerprints; (2) an air exchange rate based IAQ sensing method, which measures general IAQ using only CO$_2$ sensors; and (3) a zone-based proximity detection method for collaborative sensing, which saves energy and enables data sharing among users. MAQS has been deployed and evaluated via a real-world user study. This evaluation demonstrates that MAQS supports accurate personalized IAQ monitoring and quantitative analysis with high energy efficiency. We also found that study participants frequently experienced poor IAQ.