Wang, Weijie
Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems
Che, Baojun, Chen, Yifan, Huan, Zhenghao, Huang, Daniel Zhengyu, Wang, Weijie
This paper is concerned with the approximation of probability distributions known up to normalization constants, with a focus on Bayesian inference for large-scale inverse problems in scientific computing. In this context, key challenges include costly repeated evaluations of forward models, multimodality, and inaccessible gradients for the forward model. To address them, we develop a variational inference framework that combines Fisher-Rao natural gradient with specialized quadrature rules to enable derivative free updates of Gaussian mixture variational families. The resulting method, termed Derivative Free Gaussian Mixture Variational Inference (DF-GMVI), guarantees covariance positivity and affine invariance, offering a stable and efficient framework for approximating complex posterior distributions. The effectiveness of DF-GMVI is demonstrated through numerical experiments on challenging scenarios, including distributions with multiple modes, infinitely many modes, and curved modes in spaces with up to hundreds of dimensions. The method's practicality is further demonstrated in a large-scale application, where it successfully recovers the initial conditions of the Navier-Stokes equations from solution data at positive times.
An LLM Agent for Automatic Geospatial Data Analysis
Chen, Yuxing, Wang, Weijie, Lobry, Sylvain, Kurtz, Camille
Large language models (LLMs) are being used in data science code generation tasks, but they often struggle with complex sequential tasks, leading to logical errors. Their application to geospatial data processing is particularly challenging due to difficulties in incorporating complex data structures and spatial constraints, effectively utilizing diverse function calls, and the tendency to hallucinate less-used geospatial libraries. To tackle these problems, we introduce GeoAgent, a new interactive framework designed to help LLMs handle geospatial data processing more effectively. GeoAgent pioneers the integration of a code interpreter, static analysis, and Retrieval-Augmented Generation (RAG) techniques within a Monte Carlo Tree Search (MCTS) algorithm, offering a novel approach to geospatial data processing. In addition, we contribute a new benchmark specifically designed to evaluate the LLM-based approach in geospatial tasks. This benchmark leverages a variety of Python libraries and includes both single-turn and multi-turn tasks such as data acquisition, data analysis, and visualization. By offering a comprehensive evaluation among diverse geospatial contexts, this benchmark sets a new standard for developing LLM-based approaches in geospatial data analysis tasks. Our findings suggest that relying solely on knowledge of LLM is insufficient for accurate geospatial task programming, which requires coherent multi-step processes and multiple function calls. Compared to the baseline LLMs, the proposed GeoAgent has demonstrated superior performance, yielding notable improvements in function calls and task completion. In addition, these results offer valuable insights for the future development of LLM agents in automatic geospatial data analysis task programming.
A UAV-assisted Wireless Localization Challenge on AERPAW
Kudyba, Paul, Mandapaka, Jaya Sravani, Wang, Weijie, McCorkendale, Logan, McCorkendale, Zachary, Kidane, Mathias, Sun, Haijian, Adams, Eric, Namuduri, Kamesh, Fund, Fraida, Sichitiu, Mihail, Ozdemir, Ozgur
As wireless researchers are tasked to enable wireless communication as infrastructure in more dynamic aerial settings, there is a growing need for large-scale experimental platforms that provide realistic, reproducible, and reliable experimental validation. To bridge the research-to-implementation gap, the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) offers open-source tools, reference experiments, and hardware to facilitate and evaluate the development of wireless research in controlled digital twin environments and live testbed flights. The inaugural AERPAW Challenge, "Find a Rover," was issued to spark collaborative efforts and test the platform's capabilities. The task involved localizing a narrowband wireless signal, with teams given ten minutes to find the "rover" within a twenty-acre area. By engaging in this exercise, researchers can validate the platform's value as a tool for innovation in wireless communications research within aerial robotics. This paper recounts the methods and experiences of the top three teams in automating and rapidly locating a wireless signal by automating and controlling an aerial drone in a realistic testbed scenario.
HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation
Wang, Jin, Dai, Rui, Wang, Weijie, Rossini, Luca, Ruscelli, Francesco, Tsagarakis, Nikos
Enabling robots to autonomously perform hybrid motions in diverse environments can be beneficial for long-horizon tasks such as material handling, household chores, and work assistance. This requires extensive exploitation of intrinsic motion capabilities, extraction of affordances from rich environmental information, and planning of physical interaction behaviors. Despite recent progress has demonstrated impressive humanoid whole-body control abilities, they struggle to achieve versatility and adaptability for new tasks. In this work, we propose HYPERmotion, a framework that learns, selects and plans behaviors based on tasks in different scenarios. We combine reinforcement learning with whole-body optimization to generate motion for 38 actuated joints and create a motion library to store the learned skills. We apply the planning and reasoning features of the large language models (LLMs) to complex loco-manipulation tasks, constructing a hierarchical task graph that comprises a series of primitive behaviors to bridge lower-level execution with higher-level planning. By leveraging the interaction of distilled spatial geometry and 2D observation with a visual language model (VLM) to ground knowledge into a robotic morphology selector to choose appropriate actions in single- or dual-arm, legged or wheeled locomotion. Experiments in simulation and real-world show that learned motions can efficiently adapt to new tasks, demonstrating high autonomy from free-text commands in unstructured scenes. Videos and website: hy-motion.github.io/
Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection
Wang, Weijie, Zhao, Zhengyu, Sebe, Nicu, Lepri, Bruno
Malicious use of deepfakes leads to serious public concerns and reduces people's trust in digital media. Although effective deepfake detectors have been proposed, they are substantially vulnerable to adversarial attacks. To evaluate the detector's robustness, recent studies have explored various attacks. However, all existing attacks are limited to 2D image perturbations, which are hard to translate into real-world facial changes. In this paper, we propose adversarial head turn (AdvHeat), the first attempt at 3D adversarial face views against deepfake detectors, based on face view synthesis from a single-view fake image. Extensive experiments validate the vulnerability of various detectors to AdvHeat in realistic, black-box scenarios. For example, AdvHeat based on a simple random search yields a high attack success rate of 96.8% with 360 searching steps. When additional query access is allowed, we can further reduce the step budget to 50. Additional analyses demonstrate that AdvHeat is better than conventional attacks on both the cross-detector transferability and robustness to defenses. The adversarial images generated by AdvHeat are also shown to have natural looks. Our code, including that for generating a multi-view dataset consisting of 360 synthetic views for each of 1000 IDs from FaceForensics++, is available at https://github.com/twowwj/AdvHeaT.
Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
Mei, Guofeng, Tang, Hao, Huang, Xiaoshui, Wang, Weijie, Liu, Juan, Zhang, Jian, Van Gool, Luc, Wu, Qiang
Deep point cloud registration methods face challenges to partial overlaps and rely on labeled data. To address these issues, we propose UDPReg, an unsupervised deep probabilistic registration framework for point clouds with partial overlaps. Specifically, we first adopt a network to learn posterior probability distributions of Gaussian mixture models (GMMs) from point clouds. To handle partial point cloud registration, we apply the Sinkhorn algorithm to predict the distribution-level correspondences under the constraint of the mixing weights of GMMs. To enable unsupervised learning, we design three distribution consistency-based losses: self-consistency, cross-consistency, and local contrastive. The self-consistency loss is formulated by encouraging GMMs in Euclidean and feature spaces to share identical posterior distributions. The cross-consistency loss derives from the fact that the points of two partially overlapping point clouds belonging to the same clusters share the cluster centroids. The cross-consistency loss allows the network to flexibly learn a transformation-invariant posterior distribution of two aligned point clouds. The local contrastive loss facilitates the network to extract discriminative local features. Our UDPReg achieves competitive performance on the 3DMatch/3DLoMatch and ModelNet/ModelLoNet benchmarks.
A Novel Framework Integrating AI Model and Enzymological Experiments Promotes Identification of SARS-CoV-2 3CL Protease Inhibitors and Activity-based Probe
Hu, Fan, Wang, Lei, Hu, Yishen, Wang, Dongqi, Wang, Weijie, Jiang, Jianbing, Li, Nan, Yin, Peng
The identification of protein-ligand interaction plays a key role in biochemical research and drug discovery. Although deep learning has recently shown great promise in discovering new drugs, there remains a gap between deep learning-based and experimental approaches. Here we propose a novel framework, named AIMEE, integrating AI Model and Enzymology Experiments, to identify inhibitors against 3CL protease of SARS-CoV-2, which has taken a significant toll on people across the globe. From a bioactive chemical library, we have conducted two rounds of experiments and identified six novel inhibitors with a hit rate of 29.41%, and four of them showed an IC50 value less than 3 {\mu}M. Moreover, we explored the interpretability of the central model in AIMEE, mapping the deep learning extracted features to domain knowledge of chemical properties. Based on this knowledge, a commercially available compound was selected and proven to be an activity-based probe of 3CLpro. This work highlights the great potential of combining deep learning models and biochemical experiments for intelligent iteration and expanding the boundaries of drug discovery.