deep learning approach
Deep Survival Analysis of Longitudinal EHR Data for Joint Prediction of Hospitalization and Death in COPD Patients
Manzini, Enrico, Saito, Thomas Gonzalez, Escudero, Joan, Génova, Ana, Caso, Cristina, Perez-Porcuna, Tomas, Perera-Lluna, Alexandre
Patients with chronic obstructive pulmonary disease (COPD) have an increased risk of hospitalizations, strongly associated with decreased survival, yet predicting the timing of these events remains challenging and has received limited attention in the literature. In this study, we performed survival analysis to predict hospitalization and death in COPD patients using longitudinal electronic health records (EHRs), comparing statistical models, machine learning (ML), and deep learning (DL) approaches. We analyzed data from more than 150k patients from the SIDIAP database in Catalonia, Spain, from 2013 to 2017, modeling hospitalization as a first event and death as a semi-competing terminal event. Multiple models were evaluated, including Cox proportional hazards, SurvivalBoost, DeepPseudo, SurvTRACE, Dynamic Deep-Hit, and Deep Recurrent Survival Machine. Results showed that DL models utilizing recurrent architectures outperformed both ML and linear approaches in concordance and time-dependent AUC, especially for hospitalization, which proved to be the harder event to predict. This study is, to our knowledge, the first to apply deep survival analysis on longitudinal EHR data to jointly predict multiple time-to-event outcomes in COPD patients, highlighting the potential of DL approaches to capture temporal patterns and improve risk stratification.
Deep Learning Approach for Clinical Risk Identification Using Transformer Modeling of Heterogeneous EHR Data
This study proposes a Transformer-based longitudinal modeling method to address challenges in clinical risk classification with heterogeneous Electronic Health Record (EHR) data, including irregular temporal patterns, large modality differences, and complex semantic structures. The method takes multi-source medical features as input and employs a feature embedding layer to achieve a unified representation of structured and unstructured data. A learnable temporal encoding mechanism is introduced to capture dynamic evolution under uneven sampling intervals. The core model adopts a multi-head self-attention structure to perform global dependency modeling on longitudinal sequences, enabling the aggregation of long-term trends and short-term fluctuations across different temporal scales. To enhance semantic representation, a semantic-weighted pooling module is designed to assign adaptive importance to key medical events, improving the discriminative ability of risk-related features. Finally, a linear mapping layer generates individual-level risk scores. Experimental results show that the proposed model outperforms traditional machine learning and temporal deep learning models in accuracy, recall, precision, and F1-Score, achieving stable and precise risk identification in multi-source heterogeneous EHR environments and providing an efficient and reliable framework for clinical intelligent decision-making.
A semantic-based deep learning approach for mathematical expression retrieval
Mathematical expressions (MEs) have complex two-dimensional structures in which symbols can be present at any nested depth like superscripts, subscripts, above, below etc. As MEs are represented using LaTeX format, several text retrieval methods based on string matching, vector space models etc., have also been applied for ME retrieval problem in the literature. As these methods are based on syntactic similarity, recently deep learning approaches based on embedding have been used for semantic similarity. In our present work, we have focused on the retrieval of mathematical expressions using deep learning approaches. In our approach, semantic features are extracted from the MEs using a deep recurrent neural network (DRNN) and these features have been used for matching and retrieval. We have trained the network for a classification task which determines the complexity of an ME. ME complexity has been quantified in terms of its nested depth. Based on the nested depth, we have considered three complexity classes of MEs: Simple, Medium and Complex. After training the network, outputs just before the the final fully connected layer are extracted for all the MEs. These outputs form the semantic features of MEs and are stored in a database. For a given ME query, its semantic features are computed using the trained DRNN and matched against the semantic feature database. Matching is performed based on the standard euclidean distance and top 'k' nearest matches are retrieved, where 'k' is a user-defined parameter. Our approach has been illustrated on a database of 829 MEs.
Deep Learning Approach to Anomaly Detection in Enterprise ETL Processes with Autoencoders
Chen, Xin, Gadgil, Saili Uday, Gao, Kangning, Hu, Yi, Nie, Cong
An anomaly detection method based on deep autoencoders is proposed to address anomalies that often occur in enterprise-level ETL data streams. The study first analyzes multiple types of anomalies in ETL processes, including delays, missing values, duplicate loading, and sudden abnormal changes, and applies data standardization and feature modeling to ensure stable and usable inputs. In the method design, the encoder-decoder structure compresses high-dimensional inputs into latent representations and reconstructs them, while reconstruction error is used to measure anomaly levels. Regularization constraints are introduced in the latent space to enhance feature sparsity and distribution learning, thereby improving robustness in complex data streams. Systematic analyses under different hyperparameter settings, environmental changes, and data characteristics show that the proposed method achieves superior performance in AUC, ACC, Precision, and Recall. The results demonstrate that the deep autoencoder-based detection mechanism can effectively capture latent distribution patterns in enterprise-level ETL data streams and accurately identify diverse anomalies, providing reliable support for enterprise data processing and intelligent analysis.
SoK: Systematic analysis of adversarial threats against deep learning approaches for autonomous anomaly detection systems in SDN-IoT networks
Yasarathna, Tharindu Lakshan, Le-Khac, Nhien-An
Integrating SDN and the IoT enhances network control and flexibility. DL-based AAD systems improve security by enabling real-time threat detection in SDN-IoT networks. However, these systems remain vulnerable to adversarial attacks that manipulate input data or exploit model weaknesses, significantly degrading detection accuracy. Existing research lacks a systematic analysis of adversarial vulnerabilities specific to DL-based AAD systems in SDN-IoT environments. This SoK study introduces a structured adversarial threat model and a comprehensive taxonomy of attacks, categorising them into data, model, and hybrid-level threats. Unlike previous studies, we systematically evaluate white, black, and grey-box attack strategies across popular benchmark datasets. Our findings reveal that adversarial attacks can reduce detection accuracy by up to 48.4%, with Membership Inference causing the most significant drop. C&W and DeepFool achieve high evasion success rates. However, adversarial training enhances robustness, and its high computational overhead limits the real-time deployment of SDN-IoT applications. We propose adaptive countermeasures, including real-time adversarial mitigation, enhanced retraining mechanisms, and explainable AI-driven security frameworks. By integrating structured threat models, this study offers a more comprehensive approach to attack categorisation, impact assessment, and defence evaluation than previous research. Our work highlights critical vulnerabilities in existing DL-based AAD models and provides practical recommendations for improving resilience, interpretability, and computational efficiency. This study serves as a foundational reference for researchers and practitioners seeking to enhance DL-based AAD security in SDN-IoT networks, offering a systematic adversarial threat model and conceptual defence evaluation based on prior empirical studies.
treeX: Unsupervised Tree Instance Segmentation in Dense Forest Point Clouds
Burmeister, Josafat-Mattias, Tockner, Andreas, Reder, Stefan, Engel, Markus, Richter, Rico, Mund, Jan-Peter, Döllner, Jürgen
Close-range laser scanning provides detailed 3D captures of forest stands but requires efficient software for processing 3D point cloud data and extracting individual trees. Although recent studies have introduced deep learning methods for tree instance segmentation, these approaches require large annotated datasets and substantial computational resources. As a resource-efficient alternative, we present a revised version of the treeX algorithm, an unsupervised method that combines clustering-based stem detection with region growing for crown delineation. While the original treeX algorithm was developed for personal laser scanning (PLS) data, we provide two parameter presets, one for ground-based laser scanning (stationary terrestrial - TLS and PLS), and one for UAV-borne laser scanning (ULS). We evaluated the method on six public datasets (FOR-instance, ForestSemantic, LAUTx, NIBIO MLS, TreeLearn, Wytham Woods) and compared it to six open-source methods (original treeX, treeiso, RayCloudTools, ForAINet, SegmentAnyTree, TreeLearn). Compared to the original treeX algorithm, our revision reduces runtime and improves accuracy, with instance detection F$_1$-score gains of +0.11 to +0.49 for ground-based data. For ULS data, our preset achieves an F$_1$-score of 0.58, whereas the original algorithm fails to segment any correct instances. For TLS and PLS data, our algorithm achieves accuracy similar to recent open-source methods, including deep learning. Given its algorithmic design, we see two main applications for our method: (1) as a resource-efficient alternative to deep learning approaches in scenarios where the data characteristics align with the method design (sufficient stem visibility and point density), and (2) for the semi-automatic generation of labels for deep learning models. To enable broader adoption, we provide an open-source Python implementation in the pointtree package.
Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction
Modern deep learning developments create new opportunities for 3D mapping technology, scene reconstruction pipelines, and virtual reality development. Despite advances in 3D deep learning technology, direct training of deep learning models on 3D data faces challenges due to the high dimensionality inherent in 3D data and the scarcity of labeled datasets. Structure-from-motion (SfM) and Simultaneous Localization and Mapping (SLAM) exhibit robust performance when applied to structured indoor environments but often struggle with ambiguous features in unstructured environments. These techniques often struggle to generate detailed geometric representations effective for downstream tasks such as rendering and semantic analysis. Current limitations require the development of 3D representation methods that combine traditional geometric techniques with deep learning capabilities to generate robust geometry-aware deep learning models. The dissertation provides solutions to the fundamental challenges in 3D vision by developing geometric deep learning methods tailored for essential tasks such as camera pose estimation, point cloud registration, depth prediction, and 3D reconstruction. The integration of geometric priors or constraints, such as including depth information, surface normals, and equivariance into deep learning models, enhances both the accuracy and robustness of geometric representations. This study systematically investigates key components of 3D vision, including camera pose estimation, point cloud registration, depth estimation, and high-fidelity 3D reconstruction, demonstrating their effectiveness across real-world applications such as digital cultural heritage preservation and immersive VR/AR environments.
Prediction of mortality and resource utilization in critical care: a deep learning approach using multimodal electronic health records with natural language processing techniques
Ruan, Yucheng, Lan, Xiang, Tan, Daniel J., Abdullah, Hairil Rizal, Feng, Mengling
Background Predicting mortality and resource utilization from electronic health records (EHRs) is challenging yet crucial for optimizing patient outcomes and managing costs in intensive care unit (ICU). Existing approaches predominantly focus on structured EHRs, often ignoring the valuable clinical insights in free-text notes. Additionally, the potential of textual information within structured data is not fully leveraged. This study aimed to introduce and assess a deep learning framework using natural language processing techniques that integrates multimodal EHRs to predict mortality and resource utilization in critical care settings. Methods Utilizing two real-world EHR datasets, we developed and evaluated our model on three clinical tasks with leading existing methods. We also performed an ablation study on three key components in our framework: medical prompts, free-texts, and pre-trained sentence encoder. Furthermore, we assessed the model's robustness against the corruption in structured EHRs. Results Our experiments on two real-world datasets across three clinical tasks showed that our proposed model improved performance metrics by 1.6\%/0.8\% on BACC/AUROC for mortality prediction, 0.5%/2.2% on RMSE/MAE for LOS prediction, 10.9%/11.0% on RMSE/MAE for surgical duration estimation compared to the best existing methods. It consistently demonstrated superior performance compared to other baselines across three tasks at different corruption rates. Conclusions The proposed framework is an effective and accurate deep learning approach for predicting mortality and resource utilization in critical care. The study also highlights the success of using prompt learning with a transformer encoder in analyzing multimodal EHRs. Importantly, the model showed strong resilience to data corruption within structured data, especially at high corruption levels.
Calibrating Biophysical Models for Grape Phenology Prediction via Multi-Task Learning
Solow, William, Saisubramanian, Sandhya
Accurate prediction of grape phenology is essential for timely vineyard management decisions, such as scheduling irrigation and fertilization, to maximize crop yield and quality. While traditional biophysical models calibrated on historical field data can be used for season-long predictions, they lack the precision required for fine-grained vineyard management. Deep learning methods are a compelling alternative but their performance is hindered by sparse phenology datasets, particularly at the cultivar level. We propose a hybrid modeling approach that combines multi-task learning with a recurrent neural network to parameterize a differentiable biophysical model. By using multi-task learning to predict the parameters of the biophysical model, our approach enables shared learning across cultivars while preserving biological structure, thereby improving the robustness and accuracy of predictions. Empirical evaluation using real-world and synthetic datasets demonstrates that our method significantly outperforms both conventional biophysical models and baseline deep learning approaches in predicting phenologi-cal stages, as well as other crop state variables such as cold-hardiness and wheat yield.
Deep Learning Approaches for Multimodal Intent Recognition: A Survey
Zhao, Jingwei, Wen, Yuhua, Li, Qifei, Hu, Minchi, Zhou, Yingying, Xue, Jingyao, Wu, Junyang, Gao, Yingming, Wen, Zhengqi, Tao, Jianhua, Li, Ya
Intent recognition aims to identify users' underlying intentions, traditionally focusing on text in natural language processing. With growing demands for natural human-computer interaction, the field has evolved through deep learning and multimodal approaches, incorporating data from audio, vision, and physiological signals. Recently, the introduction of Transformer-based models has led to notable breakthroughs in this domain. This article surveys deep learning methods for intent recognition, covering the shift from unimodal to multimodal techniques, relevant datasets, methodologies, applications, and current challenges. It provides researchers with insights into the latest developments in multimodal intent recognition (MIR) and directions for future research.