mspe
Integrative Learning of Dynamically Evolving Multiplex Graphs and Nodal Attributes Using Neural Network Gaussian Processes with an Application to Dynamic Terrorism Graphs
Rodriguez-Acosta, Jose, Guha, Sharmistha, Patel, Lekha, Shuler, Kurtis
Exploring the dynamic co-evolution of multiplex graphs and nodal attributes is a compelling question in criminal and terrorism networks. This article is motivated by the study of dynamically evolving interactions among prominent terrorist organizations, considering various organizational attributes like size, ideology, leadership, and operational capacity. Statistically principled integration of multiplex graphs with nodal attributes is significantly challenging due to the need to leverage shared information within and across layers, account for uncertainty in predicting unobserved links, and capture temporal evolution of node attributes. These difficulties increase when layers are partially observed, as in terrorism networks where connections are deliberately hidden to obscure key relationships. To address these challenges, we present a principled methodological framework to integrate the multiplex graph layers and nodal attributes. The approach employs time-varying stochastic latent factor models, leveraging shared latent factors to capture graph structure and its co-evolution with node attributes. Latent factors are modeled using Gaussian processes with an infinitely wide deep neural network-based covariance function, termed neural network Gaussian processes (NN-GP). The NN-GP framework on latent factors exploits the predictive power of Bayesian deep neural network architecture while propagating uncertainty for reliability. Simulation studies highlight superior performance of the proposed approach in achieving inferential objectives. The approach, termed as dynamic joint learner, enables predictive inference (with uncertainty) of diverse unobserved dynamic relationships among prominent terrorist organizations and their organization-specific attributes, as well as clustering behavior in terms of friend-and-foe relationships, which could be informative in counter-terrorism research.
- South America > Colombia (0.28)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Texas (0.04)
- (13 more...)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resolution.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Supplementary Materials of "BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain "
These appendices provide supplementary details and results of BAST. Appendix A contains additional details on Bayesian estimation and prediction. Prediction at u is then performed as stated in Section 3.2. The experiment setup is the same as in Section 4.1 Table S3 shows the performance of BAST and BART using the hyperparameters chosen by CV (referred to as BAST -cv and BART -cv, respectively). As a benchmark, the performance metrics for BAST and BART using the hyperparameters in Section 4.1 are also included (referred to as Standard errors are given in parentheses.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resolution. In this work, we propose to enhance the model adaptability to resolution variation by optimizing the patch embedding. The proposed method, called Multi-Scale Patch Embedding (MSPE), substitutes the standard patch embedding with multiple variable-sized patch kernels and selects the best parameters for different resolutions, eliminating the need to resize the original image. Our method does not require high-cost training or modifications to other parts, making it easy to apply to most ViT models.
Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling
Dexheimer, Niklas, Schmidt-Hieber, Johannes
Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (5 more...)
Invariant Subspace Decomposition
Lazzaretto, Margherita, Peters, Jonas, Pfister, Niklas
We consider the task of predicting a response Y from a set of covariates X in settings where the conditional distribution of Y given X changes over time. For this to be feasible, assumptions on how the conditional distribution changes over time are required. Existing approaches assume, for example, that changes occur smoothly over time so that short-term prediction using only the recent past becomes feasible. In this work, we propose a novel invariance-based framework for linear conditionals, called Invariant Subspace Decomposition (ISD), that splits the conditional distribution into a time-invariant and a residual time-dependent component. As we show, this decomposition can be utilized both for zero-shot and time-adaptation prediction tasks, that is, settings where either no or a small amount of training data is available at the time points we want to predict Y at, respectively. We propose a practical estimation procedure, which automatically infers the decomposition using tools from approximate joint matrix diagonalization. Furthermore, we provide finite sample guarantees for the proposed estimator and demonstrate empirically that it indeed improves on approaches that do not use the additional invariant structure.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Japan (0.04)
Supplementary Materials of "BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain "
These appendices provide supplementary details and results of BAST. Appendix A contains additional details on Bayesian estimation and prediction. Supplementary simulation details and results including hyperparameter tuning and computation time can be found in Appendix B. Finally, Appendix C provides the proof of Proposition 1. Appendix A.1 Estimation This appendix provides details on the Markov chain Monte Carlo (MCMC) algorithm discussed in Section 3.1. This probability specification works well in our experiments, but one can modify it if desired. Appendix A.2 Prediction in Two-dimensional Constrained Domains In this subsection we provide details on specifying the neighbor set N To sample the cluster membership of u, we need to determine the cluster memberships for vertices on the domain boundary, which can be done by, for instance, assigning a boundary vertex to the same cluster as its nearest vertex in S with respect to the graph distance in the CDT mesh (when the number of vertices in the CDT graph is large, we expect this to well approximate the geodesic distance).
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm
Zgodic, Anja, Bai, Ray, Zhang, Jiajia, McLain, Alexander C.
While high-dimensional data has been ubiquitous for some time, the use of longitudinal high-dimensional data or grouped (clustered) high-dimensional data has been recently increasing in research. For example, some genetic studies gather gene expression levels for an individual on multiple occasions in response to an exposure over time (Banchereau et al., 2016). Other ongoing studies - like the UK Biobank and the Adolescent Brain Cognitive Development Study - collect high-dimensional genetic/imaging information longitudinally to learn how individual changes in these markers are related to outcomes (Cole, 2020; Saragosa-Harris et al., 2022). Such data usually violates the traditional linear regression assumption that observations are independently and identically distributed. Data analysis should account for the dependence between observations belonging to the same individual. For the low dimensional setting where n p, extensive methodology is available for handling such data structures, e.g., linear mixed models (LMMs). The fields of LMMs and high-dimensional linear regression have extensive bodies of literature. However, they are largely separate, with a very narrow body of literature existing at the intersection of LMMs and high-dimensional longitudinal data (where p n). Unlike low-dimensional (p n) LMMs for which restricted maximum likelihood (REML) methods are readily available, fitting high-dimensional LMMs is considerably more challenging due to the non-convexity of the optimization function, which requires the inversion of large matrices in addition to iterative approaches. The few available methods for highdimensional LMMs rely on sparsity-inducing penalizations (e.g.
- North America > United States > South Carolina (0.04)
- North America > United States > New York (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Massachusetts (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)