dense block
Fire-Image-DenseNet (FIDN) for predicting wildfire burnt area using remote sensing data
Pang, Bo, Cheng, Sibo, Huang, Yuhan, Jin, Yufang, Guo, Yike, Prentice, I. Colin, Harrison, Sandy P., Arcucci, Rossella
Predicting the extent of massive wildfires once ignited is essential to reduce the subsequent socioeconomic losses and environmental damage, but challenging because of the complexity of fire behaviour. Existing physics-based models are limited in predicting large or long-duration wildfire events. Here, we develop a deep-learning-based predictive model, Fire-Image-DenseNet (FIDN), that uses spatial features derived from both near real-time and reanalysis data on the environmental and meteorological drivers of wildfire. We trained and tested this model using more than 300 individual wildfires that occurred between 2012 and 2019 in the western US. In contrast to existing models, the performance of FIDN does not degrade with fire size or duration. Furthermore, it predicts final burnt area accurately even in very heterogeneous landscapes in terms of fuel density and flammability. The FIDN model showed higher accuracy, with a mean squared error (MSE) about 82% and 67% lower than those of the predictive models based on cellular automata (CA) and the minimum travel time (MTT) approaches, respectively. Its structural similarity index measure (SSIM) averages 97%, outperforming the CA and FlamMap MTT models by 6% and 2%, respectively. Additionally, FIDN is approximately three orders of magnitude faster than both CA and MTT models. The enhanced computational efficiency and accuracy advancements offer vital insights for strategic planning and resource allocation for firefighting operations.
Reviews: Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution
Summary: This paper proposes a CNN architecture called SRCliqueNet for single-image super-resolution (SISR), and it consists of two key parts, feature embedding net (FEN) and the image reconstruction net (IRN). FEN consists of two convolutional layers and a clique block group. The first convolutional layer tries to increase the number of channels of input, which can be added with the output of the clique block group via the skip connection. The second convlutional layer tries to change the number of channels so that they can fit the input of clique block group. The clique block group concatenates features from a sequence of clique blocks, each of which has two stages: 1) the first stage does the same things as dense block, while the second stage distills the feature further. Following the idea of resnet, a clique block contains more skip connections compared with a dense block, so the information among layers can be more easily propagated.
Predictive Mapping of Spectral Signatures from RGB Imagery for Off-Road Terrain Analysis
Prajapati, Sarvesh, Trivedi, Ananya, Maxwell, Bruce, Padir, Taskin
Accurate identification of complex terrain characteristics, such as soil composition and coefficient of friction, is essential for model-based planning and control of mobile robots in off-road environments. Spectral signatures leverage distinct patterns of light absorption and reflection to identify various materials, enabling precise characterization of their inherent properties. Recent research in robotics has explored the adoption of spectroscopy to enhance perception and interaction with environments. However, the significant cost and elaborate setup required for mounting these sensors present formidable barriers to widespread adoption. In this study, we introduce RS-Net (RGB to Spectral Network), a deep neural network architecture designed to map RGB images to corresponding spectral signatures. We illustrate how RS-Net can be synergistically combined with Co-Learning techniques for terrain property estimation. Initial results demonstrate the effectiveness of this approach in characterizing spectral signatures across an extensive off-road real-world dataset. These findings highlight the feasibility of terrain property estimation using only RGB cameras.
DensePANet: An improved generative adversarial network for photoacoustic tomography image reconstruction from sparse data
Hakimnejad, Hesam, Azimifar, Zohreh, Goshtasbi, Narjes
Image reconstruction is an essential step of every medical imaging method, including Photoacoustic Tomography (PAT), which is a promising modality of imaging, that unites the benefits of both ultrasound and optical imaging methods. Reconstruction of PAT images using conventional methods results in rough artifacts, especially when applied directly to sparse PAT data. In recent years, generative adversarial networks (GANs) have shown a powerful performance in image generation as well as translation, rendering them a smart choice to be applied to reconstruction tasks. In this study, we proposed an end-to-end method called DensePANet to solve the problem of PAT image reconstruction from sparse data. The proposed model employs a novel modification of UNet in its generator, called FD-UNet++, which considerably improves the reconstruction performance. We evaluated the method on various in-vivo and simulated datasets. Quantitative and qualitative results show the better performance of our model over other prevalent deep learning techniques.
Learning-based Bone Quality Classification Method for Spinal Metastasis
Peng, Shiqi, Lai, Bolin, Yao, Guangyu, Zhang, Xiaoyun, Zhang, Ya, Wang, Yan-Feng, Zhao, Hui
Spinal metastasis is the most common disease in bone metastasis and may cause pain, instability and neurological injuries. Early detection of spinal metastasis is critical for accurate staging and optimal treatment. The diagnosis is usually facilitated with Computed Tomography (CT) scans, which requires considerable efforts from well-trained radiologists. In this paper, we explore a learning-based automatic bone quality classification method for spinal metastasis based on CT images. We simultaneously take the posterolateral spine involvement classification task into account, and employ multi-task learning (MTL) technique to improve the performance. MTL acts as a form of inductive bias which helps the model generalize better on each task by sharing representations between related tasks. Based on the prior knowledge that the mixed type can be viewed as both blastic and lytic, we model the task of bone quality classification as two binary classification sub-tasks, i.e., whether blastic and whether lytic, and leverage a multiple layer perceptron to combine their predictions. In order to make the model more robust and generalize better, self-paced learning is adopted to gradually involve from easy to more complex samples into the training process. The proposed learning-based method is evaluated on a proprietary spinal metastasis CT dataset. At slice level, our method significantly outperforms an 121-layer DenseNet classifier in sensitivities by $+12.54\%$, $+7.23\%$ and $+29.06\%$ for blastic, mixed and lytic lesions, respectively, meanwhile $+12.33\%$, $+23.21\%$ and $+34.25\%$ at vertebrae level.
LightCAM: A Fast and Light Implementation of Context-Aware Masking based D-TDNN for Speaker Verification
Cao, Di, Wang, Xianchen, Zhou, Junfeng, Zhang, Jiakai, Lei, Yanjing, Chen, Wenpeng
Traditional Time Delay Neural Networks (TDNN) have achieved state-of-the-art performance at the cost of high computational complexity and slower inference speed, making them difficult to implement in an industrial environment. The Densely Connected Time Delay Neural Network (D-TDNN) with Context Aware Masking (CAM) module has proven to be an efficient structure to reduce complexity while maintaining system performance. In this paper, we propose a fast and lightweight model, LightCAM, which further adopts a depthwise separable convolution module (DSM) and uses multi-scale feature aggregation (MFA) for feature fusion at different levels. Extensive experiments are conducted on VoxCeleb dataset, the comparative results show that it has achieved an EER of 0.83 and MinDCF of 0.0891 in VoxCeleb1-O, which outperforms the other mainstream speaker verification methods. In addition, complexity analysis further demonstrates that the proposed architecture has lower computational cost and faster inference speed.