high dimensional
End to End Autoencoder MLP Framework for Sepsis Prediction
Cai, Hejiang, Wu, Di, Xu, Ji, Liu, Xiang, Zhu, Yiziting, Shu, Xin, Li, Yujie, Yi, Bin
Sepsis is a life threatening condition that requires timely detection in intensive care settings. Traditional machine learning approaches, including Naive Bayes, Support Vector Machine (SVM), Random Forest, and XGBoost, often rely on manual feature engineering and struggle with irregular, incomplete time-series data commonly present in electronic health records. We introduce an end-to-end deep learning framework integrating an unsupervised autoencoder for automatic feature extraction with a multilayer perceptron classifier for binary sepsis risk prediction. To enhance clinical applicability, we implement a customized down sampling strategy that extracts high information density segments during training and a non-overlapping dynamic sliding window mechanism for real-time inference. Preprocessed time series data are represented as fixed dimension vectors with explicit missingness indicators, mitigating bias and noise. We validate our approach on three ICU cohorts. Our end-to-end model achieves accuracies of 74.6 percent, 80.6 percent, and 93.5 percent, respectively, consistently outperforming traditional machine learning baselines. These results demonstrate the framework's superior robustness, generalizability, and clinical utility for early sepsis detection across heterogeneous ICU environments.
- Research Report > Experimental Study (0.95)
- Research Report > New Finding (0.67)
High dimensional, tabular deep learning with an auxiliary knowledge graph
Machine learning models exhibit strong performance on datasets with abundant labeled samples. Here, our key insight is that there is often abundant, auxiliary domain information describing input features which can be structured as a heterogeneous knowledge graph (KG). We propose PLATO, a method that achieves strong performance on tabular data with d \gg n by using an auxiliary KG describing input features to regularize a multilayer perceptron (MLP). PLATO is based on the inductive bias that two input features corresponding to similar nodes in the auxiliary KG should have similar weight vectors in the MLP's first layer. Across 6 d \gg n datasets, PLATO outperforms 13 state-of-the-art baselines by up to 10.19%.
86e8f7ab32cfd12577bc2619bc635690-Reviews.html
This paper proposes to use random projections as a proxy to learn BEBFs (Bellman Error Basis Functions). Given a (high dimensional) set of features and the currently estimated value function, the features are (randomly) projected on a smaller space, and the temporal differences errors (related to the currently estimated value function) are regressed on these projected features. The (scalar) regressed function is then added to the set of features used to estimate the value function. A finite sample analysis is conducted, the main result showing that if the Bellman residual is linear in the (high dimensional) features, then the Bellman error can be well regressed on the compressed space (depending notably on the size of this space and on the number of samples). The authors also use this result to provide some guarantee on the estimated value function.
Efficiently Visualizing Large Graphs
Li, Xinyu, Xiao, Yao, Zhou, Yuchen
Most existing graph visualization methods based on dimension reduction are limited to relatively small graphs due to performance issues. In this work, we propose a novel dimension reduction method for graph visualization, called t-Distributed Stochastic Graph Neighbor Embedding (t-SGNE). t-SGNE is specifically designed to visualize cluster structures in the graph. As a variant of the standard t-SNE method, t-SGNE avoids the time-consuming computations of pairwise similarity. Instead, it uses the neighbor structures of the graph to reduce the time complexity from quadratic to linear, thus supporting larger graphs. In addition, to suit t-SGNE, we combined Laplacian Eigenmaps with the shortest path algorithm in graphs to form the graph embedding algorithm ShortestPath Laplacian Eigenmaps Embedding (SPLEE). Performing SPLEE to obtain a high-dimensional embedding of the large-scale graph and then using t-SGNE to reduce its dimension for visualization, we are able to visualize graphs with up to 300K nodes and 1M edges within 5 minutes and achieve approximately 10% improvement in visualization quality. Codes and data are available at https://github.com/Charlie-XIAO/embedding-visualization-test.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > New York > New York County > New York City (0.04)
High Dimensional Causal Inference with Variational Backdoor Adjustment
Israel, Daniel, Grover, Aditya, Broeck, Guy Van den
Backdoor adjustment is a technique in causal inference for estimating interventional quantities from purely observational data. For example, in medical settings, backdoor adjustment can be used to control for confounding and estimate the effectiveness of a treatment. However, high dimensional treatments and confounders pose a series of potential pitfalls: tractability, identifiability, optimization. In this work, we take a generative modeling approach to backdoor adjustment for high dimensional treatments and confounders. We cast backdoor adjustment as an optimization problem in variational inference without reliance on proxy variables and hidden confounders. Empirically, our method is able to estimate interventional likelihood in a variety of high dimensional settings, including semi-synthetic X-ray medical data. To the best of our knowledge, this is the first application of backdoor adjustment in which all the relevant variables are high dimensional.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > Strength High (0.46)
- Research Report > Experimental Study (0.46)
Model Selection With Graphical Neighbour Information
Accurate m odel selection is a fundamental requirement for statistical analysis (1 - 5) . In many real - world applications of graphical modelling, correct model structure ident ifica tion is the ultimate objective. S tandard model validation procedures such as information theoretic scores and cross validation have demonstr ated poor performance when . Specialised methods such as EBIC, StARS and RIC have been developed for the explicit purpose of high - dimensional Gaussian graphical model selection. We present a novel model score criterion, Graphical Neighbour Information. This method demonstrates oracle performance in high - dimensional model selection, outperforming the current state - of - the - a rt in our simulations. The Graphical Neighbour Information criterion has the additional advantage of efficient, closed - form computability, sparing the costly inference of multiple models on data subsamples. We provide a theoretic analysis of the method and benchmark simulations versus the current state of the art .
- North America > United States > Pennsylvania (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
The Landmark Selection Method for Multiple Output Prediction
Balasubramanian, Krishnakumar, Lebanon, Guy
Conditional modeling x \to y is a central problem in machine learning. A substantial research effort is devoted to such modeling when x is high dimensional. We consider, instead, the case of a high dimensional y, where x is either low dimensional or high dimensional. Our approach is based on selecting a small subset y_L of the dimensions of y, and proceed by modeling (i) x \to y_L and (ii) y_L \to y. Composing these two models, we obtain a conditional model x \to y that possesses convenient statistical properties. Multi-label classification and multivariate regression experiments on several datasets show that this model outperforms the one vs. all approach as well as several sophisticated multiple output prediction methods.
- Asia > Middle East > Lebanon (0.04)
- North America > United States > New York (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (2 more...)