Goto

Collaborating Authors

 Cheng, Sibo


Machine learning for modelling unstructured grid data in computational physics: a review

arXiv.org Artificial Intelligence

Unstructured grid data are essential for modelling complex geometries and dynamics in computational physics. Yet, their inherent irregularity presents significant challenges for conventional machine learning (ML) techniques. This paper provides a comprehensive review of advanced ML methodologies designed to handle unstructured grid data in high-dimensional dynamical systems. Key approaches discussed include graph neural networks, transformer models with spatial attention mechanisms, interpolation-integrated ML methods, and meshless techniques such as physics-informed neural networks. These methodologies have proven effective across diverse fields, including fluid dynamics and environmental simulations. This review is intended as a guidebook for computational scientists seeking to apply ML approaches to unstructured grid data in their domains, as well as for ML researchers looking to address challenges in computational physics. It places special focus on how ML methods can overcome the inherent limitations of traditional numerical techniques and, conversely, how insights from computational physics can inform ML development. To support benchmarking, this review also provides a summary of open-access datasets of unstructured grid data in computational physics. Finally, emerging directions such as generative models with unstructured data, reinforcement learning for mesh generation, and hybrid physics-data-driven paradigms are discussed to inspire future advancements in this evolving field.


Tensor-Var: Variational Data Assimilation in Tensor Product Feature Space

arXiv.org Artificial Intelligence

Variational data assimilation estimates the dynamical system states by minimizing a cost function that fits the numerical models with observational data. The widely used method, four-dimensional variational assimilation (4D-Var), has two primary challenges: (1) computationally demanding for complex nonlinear systems and (2) relying on state-observation mappings, which are often not perfectly known. Deep learning (DL) has been used as a more expressive class of efficient model approximators to address these challenges. However, integrating such models into 4D-Var remains challenging due to their inherent nonlinearities and the lack of theoretical guarantees for consistency in assimilation results. In this paper, we propose Tensor-Var to address these challenges using kernel Conditional Mean Embedding (CME). Tensor-Var improves optimization efficiency by characterizing system dynamics and state-observation mappings as linear operators, leading to a convex cost function in the feature space. Furthermore, our method provides a new perspective to incorporate CME into 4D-Var, offering theoretical guarantees of consistent assimilation results between the original and feature spaces. To improve scalability, we propose a method to learn deep features (DFs) using neural networks within the Tensor-Var framework. Experiments on chaotic systems and global weather prediction with real-time observations show that Tensor-Var outperforms conventional and DL hybrid 4D-Var baselines in accuracy while achieving efficiency comparable to the static 3D-Var method.


Fire-Image-DenseNet (FIDN) for predicting wildfire burnt area using remote sensing data

arXiv.org Artificial Intelligence

Predicting the extent of massive wildfires once ignited is essential to reduce the subsequent socioeconomic losses and environmental damage, but challenging because of the complexity of fire behaviour. Existing physics-based models are limited in predicting large or long-duration wildfire events. Here, we develop a deep-learning-based predictive model, Fire-Image-DenseNet (FIDN), that uses spatial features derived from both near real-time and reanalysis data on the environmental and meteorological drivers of wildfire. We trained and tested this model using more than 300 individual wildfires that occurred between 2012 and 2019 in the western US. In contrast to existing models, the performance of FIDN does not degrade with fire size or duration. Furthermore, it predicts final burnt area accurately even in very heterogeneous landscapes in terms of fuel density and flammability. The FIDN model showed higher accuracy, with a mean squared error (MSE) about 82% and 67% lower than those of the predictive models based on cellular automata (CA) and the minimum travel time (MTT) approaches, respectively. Its structural similarity index measure (SSIM) averages 97%, outperforming the CA and FlamMap MTT models by 6% and 2%, respectively. Additionally, FIDN is approximately three orders of magnitude faster than both CA and MTT models. The enhanced computational efficiency and accuracy advancements offer vital insights for strategic planning and resource allocation for firefighting operations.


Explainable Global Wildfire Prediction Models using Graph Neural Networks

arXiv.org Artificial Intelligence

Wildfire prediction has become increasingly crucial due to the escalating impacts of climate change. Traditional CNN-based wildfire prediction models struggle with handling missing oceanic data and addressing the long-range dependencies across distant regions in meteorological data. In this paper, we introduce an innovative Graph Neural Network (GNN)-based model for global wildfire prediction. We propose a hybrid model that combines the spatial prowess of Graph Convolutional Networks (GCNs) with the temporal depth of Long Short-Term Memory (LSTM) networks. Our approach uniquely transforms global climate and wildfire data into a graph representation, addressing challenges such as null oceanic data locations and long-range dependencies inherent in traditional models. Benchmarking against established architectures using an unseen ensemble of JULES-INFERNO simulations, our model demonstrates superior predictive accuracy. Furthermore, we emphasise the model's explainability, unveiling potential wildfire correlation clusters through community detection and elucidating feature importance via Integrated Gradient analysis. Our findings not only advance the methodological domain of wildfire prediction but also underscore the importance of model transparency, offering valuable insights for stakeholders in wildfire management.


Multi-fidelity physics constrained neural networks for dynamical systems

arXiv.org Artificial Intelligence

Physics-constrained neural networks are commonly employed to enhance prediction robustness compared to purely data-driven models, achieved through the inclusion of physical constraint losses during the model training process. However, one of the major challenges of physics-constrained neural networks consists of the training complexity especially for high-dimensional systems. In fact, conventional physics-constrained models rely on singular-fidelity data necessitating the assessment of physical constraints within high-dimensional fields, which introduces computational difficulties. Furthermore, due to the fixed input size of the neural networks, employing multi-fidelity training data can also be cumbersome. In this paper, we propose the Multi-Scale Physics-Constrained Neural Network (MSPCNN), which offers a novel methodology for incorporating data with different levels of fidelity into a unified latent space through a customised multi-fidelity autoencoder. Additionally, multiple decoders are concurrently trained to map latent representations of inputs into various fidelity physical spaces. As a result, during the training of predictive models, physical constraints can be evaluated within low-fidelity spaces, yielding a trade-off between training efficiency and accuracy. In addition, unlike conventional methods, MSPCNN also manages to employ multi-fidelity data to train the predictive model. We assess the performance of MSPCNN in two fluid dynamics problems, namely a two-dimensional Burgers' system and a shallow water system. Numerical results clearly demonstrate the enhancement of prediction accuracy and noise robustness when introducing physical constraints in low-fidelity fields. On the other hand, as expected, the training complexity can be significantly reduced by computing physical constraint loss in the low-fidelity field rather than the high-fidelity one.


Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training

arXiv.org Artificial Intelligence

Modern healthcare often utilises radiographic images alongside textual reports for diagnostics, encouraging the use of Vision-Language Self-Supervised Learning (VL-SSL) with large pre-trained models to learn versatile medical vision representations. However, most existing VL-SSL frameworks are trained end-to-end, which is computation-heavy and can lose vital prior information embedded in pre-trained encoders. To address both issues, we introduce the backbone-agnostic Adaptor framework, which preserves medical knowledge in pre-trained image and text encoders by keeping them frozen, and employs a lightweight Adaptor module for cross-modal learning. Experiments on medical image classification and segmentation tasks across three datasets reveal that our framework delivers competitive performance while cutting trainable parameters by over 90% compared to current pre-training approaches. Notably, when fine-tuned with just 1% of data, Adaptor outperforms several Transformer-based methods trained on full datasets in medical image segmentation.


G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training

arXiv.org Artificial Intelligence

Recently, medical vision-language pre-training (VLP) has reached substantial progress to learn global visual representation from medical images and their paired radiology reports. However, medical imaging tasks in real world usually require finer granularity in visual features. These tasks include visual localization tasks (e.g., semantic segmentation, object detection) and visual grounding task. Yet, current medical VLP methods face challenges in learning these fine-grained features, as they primarily focus on brute-force alignment between image patches and individual text tokens for local visual feature learning, which is suboptimal for downstream dense prediction tasks. In this work, we propose a new VLP framework, named \textbf{G}lobal to \textbf{D}ense level representation learning (G2D) that achieves significantly improved granularity and more accurate grounding for the learned features, compared to existing medical VLP approaches. In particular, G2D learns dense and semantically-grounded image representations via a pseudo segmentation task parallel with the global vision-language alignment. Notably, generating pseudo segmentation targets does not incur extra trainable parameters: they are obtained on the fly during VLP with a parameter-free processor. G2D achieves superior performance across 6 medical imaging tasks and 25 diseases, particularly in semantic segmentation, which necessitates fine-grained, semantically-grounded image features. In this task, G2D surpasses peer models even when fine-tuned with just 1\% of the training data, compared to the 100\% used by these models. The code will be released upon acceptance.


Spectral Cross-Domain Neural Network with Soft-adaptive Threshold Spectral Enhancement

arXiv.org Artificial Intelligence

Electrocardiography (ECG) signals can be considered as multi-variable time-series. The state-of-the-art ECG data classification approaches, based on either feature engineering or deep learning techniques, treat separately spectral and time domains in machine learning systems. No spectral-time domain communication mechanism inside the classifier model can be found in current approaches, leading to difficulties in identifying complex ECG forms. In this paper, we proposed a novel deep learning model named Spectral Cross-domain neural network (SCDNN) with a new block called Soft-adaptive threshold spectral enhancement (SATSE), to simultaneously reveal the key information embedded in spectral and time domains inside the neural network. More precisely, the domain-cross information is captured by a general Convolutional neural network (CNN) backbone, and different information sources are merged by a self-adaptive mechanism to mine the connection between time and spectral domains. In SATSE, the knowledge from time and spectral domains is extracted via the Fast Fourier Transformation (FFT) with soft trainable thresholds in modified Sigmoid functions. The proposed SCDNN is tested with several classification tasks implemented on the public ECG databases \textit{PTB-XL} and \textit{MIT-BIH}. SCDNN outperforms the state-of-the-art approaches with a low computational cost regarding a variety of metrics in all classification tasks on both databases, by finding appropriate domains from the infinite spectral mapping. The convergence of the trainable thresholds in the spectral domain is also numerically investigated in this paper. The robust performance of SCDNN provides a new perspective to exploit knowledge across deep learning models from time and spectral domains. The repository can be found: https://github.com/DL-WG/SCDNN-TS


Efficient deep data assimilation with sparse observations and time-varying sensors

arXiv.org Artificial Intelligence

Variational Data Assimilation (DA) has been broadly used in engineering problems for field reconstruction and prediction by performing a weighted combination of multiple sources of noisy data. In recent years, the integration of deep learning (DL) techniques in DA has shown promise in improving the efficiency and accuracy in high-dimensional dynamical systems. Nevertheless, existing deep DA approaches face difficulties in dealing with unstructured observation data, especially when the placement and number of sensors are dynamic over time. We introduce a novel variational DA scheme, named Voronoi-tessellation Inverse operator for VariatIonal Data assimilation (VIVID), that incorporates a DL inverse operator into the assimilation objective function. By leveraging the capabilities of the Voronoi-tessellation and convolutional neural networks, VIVID is adept at handling sparse, unstructured, and time-varying sensor data. Furthermore, the incorporation of the DL inverse operator establishes a direct link between observation and state space, leading to a reduction in the number of minimization steps required for DA. Additionally, VIVID can be seamlessly integrated with Proper Orthogonal Decomposition (POD) to develop an end-to-end reduced-order DA scheme, which can further expedite field reconstruction. Numerical experiments in a fluid dynamics system demonstrate that VIVID can significantly outperform existing DA and DL algorithms. The robustness of VIVID is also accessed through the application of various levels of prior error, the utilization of varying numbers of sensors, and the misspecification of error covariance in DA.


IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training

arXiv.org Artificial Intelligence

In the field of medical Vision-Language Pre-training (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into `findings' for descriptive content and `impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment.